Applied Modern Data-Driven Software Architecture
A practical guide, walking you through the complete architecture of an ambitious product that operates efficiently under high-load, scales to petabytes (!) of data, built on a modern, next-gen, cloud-native, open source tech stack.
Planned for 2024
What you'll learn
- Get to know different Architecture Areas
- Exemplary Architectural Approach
- Understand features, capabilities, limitations of:
- ScyllaDB | Apache Cassandra
- Redpanda | Apache Kafka
- Elasticsearch
- Working with distributed eventual consistent data systems. (Three of them)
- Characteristics and challenges
- Design principles, patterns, anti-patterns
- Integration, data sync, event sourcing
- Schema & app design with sharding + partitioning in mind
- Adopting a Software Architect's mindset
- Event & Stream Processing
- CDC, r/w with Kafka Connect
- Stateless and stateful data processing with Kafka, Kafka Streams & Apache Flink
- Applied architecture deep dive
- Data, Backend, Event Processing, Frontend, Platform
- High traffic and large scale application
- High volumes of data, at high velocity
- Optimise for availability, reliability, performance
- Building blocks of secure architecture
- API, User/Client, Data in-transit/at-rest
What to expect
- The general tenor is quite technical
- Most of the training is very practical since everything relates to a real application, real use-cases & requirements
- Starting off high-level, but getting into low-level/in-depth designs and solutions
- Clear visual explanations and software architecture diagrams
- The overall presentation style is like an on-boarding session (or hand over)
- The might of a well-conceived, tailored end-to-end architecture
What -not- to expect
- NO hands-on practice lessons, tasks, quizzes or similar
- NO coding exercises (though the ‘Applied Architecture’ chapter may contain real db schema or simple code samples)
- NOT about (enterprise) architecture frameworks such as TOGAF, ITIL, SAFe.
The course focus is on the conceptual design process and actual solutions to given requirements. - NOT teaching soft-skills - 'how to become a software architect'. (there already are many excellent guides or books available)
Audience
- Developers, Backend Experience
- Software/Solution Architects
- Data Engineers
- Everyone interested in learning modern design patterns and new technologies...
Prerequisites
- Overall level of difficulty: advanced
-> premise of solid software engineering fundamentals - Plus but not a must: understanding of basic concepts of distributed systems, eventual consistency, data sharding, data validity
- No prior knowledge of the fundamental data technologies is required. Basics will be covered throughout the training and more advanced concepts relevant to the project architecture are explored in detail with real use-cases as part of the ‘applied architecture’ section.
- No coding skills are required
Technologies
Fundamental stack, directly relevant to the architecture.
- Data
- ScyllaDB / Apache Cassandra
- Redpanda / Apache Kafka
- Elasticsearch
- Cloud Object Storage (~S3)
- Backend, Integration
& Event Processing- Stateless Services (Core Backend)
- Kafka Streams, Apache Flink
- CDC, Kafka Connect, Debezium
- Web Frontend
- SPA, PWA, SSR
- Serverless, CDN, global
- Platform
- Containers
- Kubernetes
While e.g. data systems are defining components of the overall (L1/L2) architecture, on the contrary the choice of frontend framework is more of a matter of personal preference (Svelte vs. React vs. Vuejs, ...).
- Data
- ScyllaDB
- Redpanda
- Elasticsearch
- S3 (compatible)
- Backend
- Kotlin
- Spring
- Project Reactor
- Caffeine Cache
- Redis
- Web Frontend
- TypeScipt
- Vue3
- Tailwindcss
- Integration & Event Processing
- Kotlin
- Spring (Boot, Kafka, Cloud Stream)
- Kafka Streams
- Kafka Connect
- Apache Flink
- Interfaces & Schemas
- REST, OpenAPI
- Avro
- Platform
- Kubernetes
- (...full platform/ops stack yet to be decided)< class="list-none pl-0">
- k8s platform provider
- serverless web frontend provider
- security
- hashicorp vault
- (...more yet to be decided)
- ~flux2 (gitops)
- ~Prometheus, Thanos(?), Grafana
- ~Loki vs. ELK vs. other?
- Nginx
- (...more yet to be decided)
- Pusher
- ?Sendgrid
The Product
The product subject to this guide is a microblogging and social networking service. Users access the application through its website interface. Registered members submit content to the site such as text posts, links, images, and videos. Content is organised into 'feeds', can be rated, searched, filtered and notifications and email digests are emitted. It provides access-control for feeds through organisations, teams, collaborators, roles and permissions.
Among other NFR the primary focus is put on availability, reliability, performance, and scalability. The app operates efficiently under high-load, scales to petabytes (!) of data, with the ambition to support a magnitude of e.g. Github, Discord, Reddit, or even Twitter.
Admittedly, this is a bold statement and big scale load tests are yet to be executed and to be made available - but I’m confident that the architecture and tech stack will take you a long way.