How Fast Do We Want To Scale Up?

This question would determine whether or not you use already built service.

For example if the requirement is to scale from 10000 user to 10 million user within 3 months. We wouldn't want to setup our postgres from scratch and do manual sharding because we don't have time for that

If the time is 1 year, we can choose more stready approach, we can start building from single postgres instance with few replicas and gradually scale up

Feature	3-Month Target (Rapid)	1-Year Target (Steady)
Philosophy	Scale first, optimize later.	Optimize for cost and reliability.
Database choice	Managed NoSQL (DynamoDB/Cassandra)	Relational (PostgreSQL/MySQL)
Initial Build	Multi-region, Global Load Balancing	Single region with Multi-AZ (Availability Zones)
Caching	Aggressive caching at every layer	Caching only for the slowest queries
Bottlenecks	Solved by adding more "hardware" (RAM/Nodes)	Solved by optimizing code and indexes

The point of this question is to balance between technical debt and over-engineering

Short time scale

Depends on requirement, focus on high availability

Component	Choice	Reason
Database	For a balance of read and write. Fully managed NoSQL, CosmosDB, DynamoDB	Comes with auto-sharding. Much easier to shard comparing to NoSQL. Technology like DynamoDB now support ACID as well since 2018, not BASE anymore. DynamoDB support global write so write is very fast. DynamoDB ACID is more mature
	If performance is not critical and SQL is preferred, choose Aurora, AzureSQL database	Aurora is not auto-sharding. However it does provide High availability. The write is single region in comparision with DynamoDB that support global write. Read is auto scales. Suitable for team that's already familiar with SQL or specifically need SQL architecture.
NOTE: there could be some new recent product that would make scale on SQL a bit easier
	For a generic but very high through put, choose ScyllaDB managed (AWS Keyspace)	ScyllaDB is the C++ version of Cassandra, very high throughput, support basic ACID not matured as DynamoDB. The throughput is higher and the cost is much cheaper comparing to DynamoDB or Postgres when comes to high write.
Cache	Redis could be a good choice, we can setup redis for the first few months and eventually switching to redis cluster. For reading strategy, we can use Cache aside. For writing, we can use Write through > Write through with invalidation	Redis provide high availability with built-in recovery system. When your instance crashes, it automatically promote the backup. It automatically shard your data which is good for scaling. Support various data-types for development purpose. For redis-cluster, there could potentially be a replica-lag. We need to implement Deal with redis-replica lag strategy. Cache-aside is a good read strategy because in a complex scenario where application needs to query from multiple database, it handles well. Similar to write-through.
Messaging	SNS, SQS	SNS is good for pubsub, SQS is good for message queue
Load balancing	AWS Application Load Balancer, Azure Front Door
API gateway	API Gateway

Long time scale

Component	Choice	Reason
Database NoSQL	ScyllaDB, Couchbase, MongoDB
SQL	postgres, CockroachDB
Cache	Redis
Messaging	Kafka, RabbitMQ
Load balacing	HAProxy
API gateway	KongAPI