How Fast Do We Want To Scale Up?

This question would determine whether or not you use already built service.

For example if the requirement is to scale from 10000 user to 10 million user within 3 months. We wouldn't want to setup our postgres from scratch and do manual sharding because we don't have time for that

If the time is 1 year, we can choose more stready approach, we can start building from single postgres instance with few replicas and gradually scale up

Feature3-Month Target (Rapid)1-Year Target (Steady)
PhilosophyScale first, optimize later.Optimize for cost and reliability.
Database choiceManaged NoSQL (DynamoDB/Cassandra)Relational (PostgreSQL/MySQL)
Initial BuildMulti-region, Global Load BalancingSingle region with Multi-AZ (Availability Zones)
CachingAggressive caching at every layerCaching only for the slowest queries
BottlenecksSolved by adding more "hardware" (RAM/Nodes)Solved by optimizing code and indexes

The point of this question is to balance between technical debt and over-engineering

Short time scale

Depends on requirement, focus on high availability

ComponentChoiceReason
DatabaseFor a balance of read and write. Fully managed NoSQL, CosmosDB, DynamoDBComes with auto-sharding. Much easier to shard comparing to NoSQL. Technology like DynamoDB now support ACID as well since 2018, not BASE anymore. DynamoDB support global write so write is very fast.

DynamoDB ACID is more mature
If performance is not critical and SQL is preferred, choose Aurora, AzureSQL databaseAurora is not auto-sharding. However it does provide High availability. The write is single region in comparision with DynamoDB that support global write. Read is auto scales.

Suitable for team that's already familiar with SQL or specifically need SQL architecture.

NOTE: there could be some new recent product that would make scale on SQL a bit easier
For a generic but very high through put, choose ScyllaDB managed (AWS Keyspace)ScyllaDB is the C++ version of Cassandra, very high throughput, support basic ACID not matured as DynamoDB. The throughput is higher and the cost is much cheaper comparing to DynamoDB or Postgres when comes to high write.
CacheRedis could be a good choice, we can setup redis for the first few months and eventually switching to redis cluster.

For reading strategy, we can use Cache aside. For writing, we can use Write through > Write through with invalidation
Redis provide high availability with built-in recovery system. When your instance crashes, it automatically promote the backup.

It automatically shard your data which is good for scaling. Support various data-types for development purpose.

For redis-cluster, there could potentially be a replica-lag. We need to implement Deal with redis-replica lag strategy.

Cache-aside is a good read strategy because in a complex scenario where application needs to query from multiple database, it handles well. Similar to write-through.
MessagingSNS, SQSSNS is good for pubsub, SQS is good for message queue
Load balancingAWS Application Load Balancer, Azure Front Door
API gatewayAPI Gateway

Long time scale

ComponentChoiceReason
Database NoSQLScyllaDB, Couchbase, MongoDB
SQLpostgres, CockroachDB
CacheRedis
MessagingKafka, RabbitMQ
Load balacingHAProxy
API gatewayKongAPI