Notes
AppSync
- can do real-time stuff, WebSocket, MQTT on WebSocket
- GraphQL
Lambda
- Lambda Event Source Mapping is to pull from service synchronously to trigger something.
- When pooling from DynamoDB or Kinesis Data Stream, it doesn't delete the item
- If pooling from SQS, SNS, it deletes the item.
- if we want in-order processing, we need to use FIFO
- The more RAM, the stronger the CPU Lambda Function Performance
- Don't need to run x-ray daemon
- For
- lambda to invoke other services: Lambda Execution Role (IAM Role)
- other services to invoke lambda: Lambda Resource Based Policies
- invocation on CLI
- aysnc:
--invocation-type: Event
- sync:
--invocation-type: RequestResponse
- aysnc:
SAM
- Has traffic shifting feature, automated rollback since it's using CodeDeploy in the background
- To step through and debug the code, use SAM CLI + AWS Toolkits
- AWS Policy Templates: to give permission to lambda function (yes only lambda)
- Some of the command
sam build
: fetch and generate and create local deployments artifactsam package
: package and upload to S3sam deploy
: deploy to CloudFormationsam publish
: publish to AWS SAR
- Some of the command
SAR
- Serverless application repository to store SAM applications
KMS
- AWS managed key for us
- Has 2 main services
- AWS Managed Key: Pre-defined key. Free of charge. For example
aws/rds
,aws/ebs
- Customer Managed Keys (CMK): has 2 type of keys which are both $1 per month per key
- Create your own
- Import your own
- AWS Managed Key: Pre-defined key. Free of charge. For example
- Encryption and Decryption limited to 4KB, if more than that we need to use Envelope Encryption technique
- Key rotation
- AWS Managed Key: automatic every 1 year
- Customer Managed Key: must be enabled
- For created key, automatic every 1 year
- For imported key, must manually rotate using alias
- (makes sense cus you import the key)
- Key policy
- You need to have key policy to even access to the key
- Default will allow the entire account to access the key if you don't specify one.
- Good for cross-account access
- Envelop encryption / decryption
- Technique to encrypt / decrypt large file (more than 4KB)
- Use something called a Data Key.
- Data Key Caching:
- Cache the data key and re-using it to avoid having to call KMS to reduce quotas consumption
- For example: S3 Bucket Key
- Throttle and stuff
- Every services that make request to KMS will share the same quota across each region
- Solution
- Request Quotas increase
- Data Key Caching
- Exponential Backoff
- Some of the API
Encrypt
: 4KB encryptDecrypt
: 4KB decryptGenerateRandom
: return random byte of stringGenerateDataKey
: Generate unique data key, return a plaintext and encrypted copy of the CMKGenerateDataKeyWithoutPlainText
: Generate a data key but for later use at some point
AWS SSM Parameter Store
- Old way to store parameter and secret, we use the Parameter Store of System Manager to store stuff
- Can store in hierarchy as well
- No charge if we use standard tier, providing
- Maximum size of parameter: 4KB
- Total number of parameters: < 10,000
- Advanced tier can have TTL (expiration policies) to force update or delete
- No automatic rotation
- KMS is mandatory
AWS Secret Manager
- Newer services dedicated to storing secrets and stuff
- Capacity to force rotation every X days
- Seemlessly integration with RDS
- Can do hierarchy storing (folder like structure)
- Automatic rotation
- KMS is optional
Step Function
- Organise workflows as state machine
- Standard vs express
- Standard if you need maximum duration 1 year
- Express if it's 5 minutes
- Notes: express support more executions, standard has 2000 capped
- Error handling:
- We can use
Retry
orCatch
with exponential backoff rate
- We can use
- State types
- Task state: (Lambda job, Batch job, SQS, EC2, ...) can invoke or run 1 activity
- Choice state: make a choice
- Fail or Succeed rate: stop an execution with failure or success
- Pass state: pass through input as output, don't do anything
- Wait state: delay
- Map state: dynamically iterate (loop)
- Parallel state: begin parallel execution
CloudFormation
- ChangeSet: See what changes before updating the stack. This won't say if the update is successful.
- CloudFormation Drift: check if there is manual config
- Can be edited using
- YAML
- CloudFormation Designer
- Building block
resources
(mandatory)Mappings
: constants to declareParameters
: for users to put inputs inOutputs
: what to exportConditions
: needs to satisfies these conditions
- Can't edit previous template, have to update the new one to overwrite
- Rollback behaviour:
- Stack creation fails:
- Everything roll back (removed)
- Stack update fails:
- roll back to previous working state
- Stack creation fails:
- Nested stack
- for reusing components
- Cross stack:
- to share the output of one stacks to another, export some values
- Stacksets:
- Create, update delete stack across multiple region
CloudFront
- Pricing classes:
- All: all regions (best performance)
- 200: exclude most expensive
- 100: only the least expensive
- Multiple region:
- Comes with ability to route to different origins based on the path
- Origin group:
- Consist of 1 primary and 1 secondary: if primary fail, we use secondary
- always route to primary first before routing to secondary
- Field level encryption:
- protect sensitive information encrypted at edge close to the user
- Specify the specific field in POST request you want to encrypt
- CloudFront caching:
- Can cache based on header, session cookies, query string parameters.
- Cache luives at CloudFront Edge Location
- Control TTL of the cache
- Invalidate cache using
CreateInvalidation
API - Have the ability to cache between static and dynamic content
- For dynamic cached based on headers, and cookies
- For static cache normally.
- Security
- Viewer Protocol Policy
- To work with Client, viewer side
- redirect client http to https
- force client to use http only
- To work with Client, viewer side
- Origin protocol policy
- Protocol side
- Specify HTTPS only
- Match with Viewer Protocol
- Protocol side
- Viewer Protocol Policy
- Note: S3 Bucket website doesn't support HTTPS
- CloudFront Signed URL:
- Distribute premium content
- any origin
- account-wide
- different than S3 Signed URL:
- S3 signed URL only have limited life type
- Use the IAM key of the signing IAM principal => User have the same permission
- no caching, only can sign s3 bucket
- When signing it's recommend to use trusted key group
- Support auto rotation key
- Otherwise you need to use root account which is not recommended
CodeArtifact
- To store software dependencies (maven, npm)
- For CodeBuild and develoeprs to retrieve artifiacts
CodeBuild
- Like Jenkins CI, supports Build and Test the application
- Cannot access to resources in VPC, if you want you need to configure it
CodeCommit
- Github thingy
CodeDeploy
- CD to deploy. Need
appspec.yml
in the root directory - Components
- Application: application's name
- Compute platform: EC2, Lambdaa
- Deployment Configuration:
- One at a time
- Custom (min healthy instances)
- All at once
- Half at a time
- Deployment group
- Group of EC2 or ASG
- Deployment type
- in-place deployment
- blue/green deployment
- IAM Instance Profile: to get permission to access to S3 and Github
- Application revision
- Service Role: IAM Role for code deploy to perform operations on EC2
- Target Revisision: most recent revision you want to deploy
- Hooks (will be called in this order):
ApplicationStop
DownloadBundle
BeforeInstall
Install
AfterInstall
ApplicationStart
ValidateService
:- Make sure the service is working correctly
- Rollback:
- Automatically or manually, when rollback it will redeploy previous last known good revision as a new version
- If we are doing in-place upgrade, we need to re-deploy the original version.
- Only blue-green deployment allows you to rollback
- Automatically or manually, when rollback it will redeploy previous last known good revision as a new version
CodeGuru
ML powered service
- CodeGuru reviewer: Automated code review
- CodeGuru profiler: application performance recommendations
CodePipeline
- Visual workflow for CI CD pipeline
- Flow mananger CodeCommit, CodeBuild, CodeDeploy integration
- Next step read CodePipeline Artifact
CodeStar
- All-in-one central UI for CodeCommit, CodeBuild, CodeDeploy, CodePipeline
- Has Cloud9 web IDE development
DynamoDB
- Has two modes
- On-demand:
- Automatic increase scale for read, write
- Unlimited WRUs (write request unit) and RCUs (read request unit)
- Charge based on RRUs (read request units) and WRUs (Write request unit)
- More expensive
- Provisioned:
- Have to provision RCUs and WCUs
- RCUs calculation:
- 1 Read request = 1 RCU
- 1 RCU = 1 Strongly Consistent Read = 2 Eventually consistent read
- Up to 4 KB read per item
- WCUs
- 1 Write request = 1 WCU
- Write up to 1 KB
- Has 2 mode
- Standard
- Transactional: Single or nothing
- RCUs calculation:
- Cheaper, have option to setup auto-scaling of RCU and WCU to meet demand by providing min, max, target utility (%)
- Throughput can be exceeded temporarily using Burst Capacity
- Have to provision RCUs and WCUs
- On-demand:
- DynamoDB transaction:
- Consumes 2x WCUs and RCUs since it performs 2 operations for every item (prepare & commit)
TransactGetItems
TransactWriteItems
- Consumes 2x WCUs and RCUs since it performs 2 operations for every item (prepare & commit)
- Write types:
- Concurrent writes (1 overwrites the other)
- Conditional writes
- Atomic writes (take both)
- Batch writes
- Primary key (required) can be either
- hash (partition key only)
- hash + range (partition key + sort key)
- Sort Key (optional): determine the order fo how the data can be sorted
- APIS
PutItem
: create or replaceUpdateItem
: edit item's attributesConditional Writes
- No performance impact
GetItem
:ProjectionExpression
: To retrieve only certain attribute, use- Default for Eventually Consistent
Query
: query item using- For key attribute, use:
KeyConditionExpress
- key attribute can only use
=
- key attribute can only use
- For sort key, use:
FilterExpression
- Sort key can use
=, >=, <=,...
- Sort key can use
- For other attribute, use
FilterExpression
- Happening in client side, runs after
Query
is executed but before result returns to you
- Happening in client side, runs after
- For key attribute, use:
Scan
:- Read entire table then filter out the data (inefficient)
- For faster performance, use
Parallel Scan
- can use
ProjectionExpression
&FilterExpression
FilterExpression
is client side filteringProjectionExpression
is server side, the functionality is mostly the same
DeleteItem
- Delete individual item
- can perform conditional delete
DeleteTable
- Drop the whole table, quicker then calling
DeleteItem
on all items
- Drop the whole table, quicker then calling
BatchWriteItem
:- Up to 25
PutItem
and/orDeleteItem
- maximum 16MB of data written with 400 KB of data per item
- can't update item
- Up to 25
BatchGetItem
- return the items from one or more tables parallely
- up to 100 items or 16MB of data
--page-size
: default 1000 items- how many items are we query concurrently
- for example, if page size = 100 and we have 1000 items. We have 10 concurrent api calls to avoid timeout
--max-item
:- max item to show for the current query, return
NextToken
for--starting-token
- max item to show for the current query, return
--starting-token
:- Given a
NextToken
and start querying from there, pagination stuff
- Given a
- Local Secondary Index (LSI)
- additional sort key
- Can have up to 5 LSI
- Must be defined at creation time.
- Attribute projections:
- Can get all attirbutes in the base table
- Can get these attribute even though it's not projected with the index
- Use WCUs and RCUs of the main table
- Global Secondary Index
- additional primary key (can be hash or range + hash)
- To speed up on queries that are not primary key
- Attribute projections:
- Can also get attributes in the base table
- However, if the attributes is not projected with index, we can't get it
- Must provision RCUs and WCUs separately for the index (autoscalable)
- Limitations:
- Each table has infinite number of rows
- Maximum size of item is 400 KB
- Supported type includes: String, number, binary, boolean, null, List, Map, Set
- DynamoDB Accelerator (DAX)
- Improve read by using memory cache.
- 5 minutes TTL. Up to 10 nodes
- Multi-AZ support
- Comparison to ElastiCache:
- DAX is good for individual object cache
- Elasticache is more like caching the result as the whole
- Optimistic locking
- Allow you to have conditional write
- PartiQL
- Query DynamoDB using SQL like syntax
- Support Batch operations
- Partition
- Internally, DynamoDB store data in partition
- WCUs and RCUs are spread evently across partitions
- So if you have 10 partition and 10 RCUs and 10 WCUs, each partition has 1 RCU and 1 WCU
- WCUs and RCUs are spread evently across partitions
- Partition strategies
- If the partition is too hot, we can
- Shard using random suffix
- Shard using calculated suffix
- If the partition is too hot, we can
- Internally, DynamoDB store data in partition
- Security
- VPC endpoints, IAM, SSL (Secure Sockets Layer), TLS (Transport Layer Security)
- DynamoDB Global tables
- Multi-region, multi-active, fully replicated, high performance
- DynamoDB local
- Test locally without accessing DynamoDB web for local environment
- AWS Database Migration Service
- To migrate other databases into DynamoDB (MongoDB, Oracle, Mysql)
- For user to interact with DynamoDB directly
- use Cognito, specify
LeadingKeys
andAttributes
to limit specific attributes user can see
- use Cognito, specify
- DynamoDB stream:
- Ordered stream level modifications
- retention up to 24 hours
- Only receive updates after you enable stream
- Throttling
- If exceeded provisioned RCU and WCU, we're gonna get
ProvisionedThroughputExceededException
- Solutions
- Exponential Backoff
- Redistribute (re-shard)
- If it's RCU can try DAX
- For DynamoDB Global Secondary Index if WCU throttled, the main table will be throttled as well
- For DynamoDB Local Secondary Index, since we're using the same WCUs and RCUs of the main table, no special throttling considerations
- If exceeded provisioned RCU and WCU, we're gonna get
- TTL
- Time to live for each item, automatically deleted within 48 hours of expirations
- Will be deleted from both LSI and GSI
- Expired items that haven't been deleted will still appears in reads queries or scans
- Doesn't consume WCUs, delete operation goes to DynamoDB stream
- Time to live for each item, automatically deleted within 48 hours of expirations
ECS
- ECR: Repository for ECS
- to push/pull from ECR use the native
docker push/pull
- to push/pull from ECR use the native
- Rolling update
- Can specify
- Minimum healthy percentage
- minimum task to be healthy (0 - 100%)
- Maximum healthy percentage
- maximum task to be healthy (100% - 200%)
- Minimum healthy percentage
- Can specify
- Task placement
- Strategies
- Type
- BinPack: try to fill in one EC2 as much as possible
- Random: randomly placed in EC2
- Spread: spread across specific value (availability zone, instance id, ...)
- We can mix the type together, for example spread the availbility zone and binpack the memory
- Type
- Constraints
distinctInstance
: each task is placed in a different container instancememberOf
: place task on instance that satisfy an expression
- Strategies
- Task Definition:
- JSON form tell ECS how to run a docker container, has image name, port binding, ...
- For EC2 launch type:
- Get dynamic host port mapping if you define only the container port in the task definition
- For Fargate launch type:
- Each task has an unique private IP
Elastic Beanstalk
- Extensions
- To configure Elastic Beanstalk using code
- Add stuff inside
.ebextensions/
- Files has to be eitrher
yaml
orjson
and the ending need to be.config
- Can integrate with HTTPS
- By loading cert into
.ebextensions/securelistener-alb.config
- or setup using Console or ACM (AWS Certificate Mananger)
- By loading cert into
- Can also redirect from HTTP to HTTPs
- Cloning: Allow to clone the exact same Elastic Beanstalk environment. Good for testing
- Components
- Application: collection of elastic beanstalks components (like a folder)
- Application Version: the version of your code
- Environment: Collection of AWS resources running the application
- Tiers:
- Web Server Tier
- Have ELB with EC2 running web server with ASG
- Worker tier
- SQS Queue and EC2 with ASG
- Web Server Tier
- Can create multi environment (dev, test, prod)
- Tiers:
- Custom platform:
- Define your own platform with custom OS system, software and scripts
- Define an AMI in
platform.yaml
file
- Deployment
- process
- Describe dependencies (in
package.json
orrequirements.txt
) - Upload the code
- Through console: Using zip
- Through CLI: just type the command it will zip for you
- It will deploy the thing on EC2, resolve dependencies and start the application
- Describe dependencies (in
- options:
- Single Instance
- High Availability with Load Balancer
- process
- Docker Integration
- Single Docker
- Does not use ECS
- Provide
Dockefile
orDockerrun.aws.json
- Mutli Docker
- Use ECS, will create ECS cluster
- Provide
Dockerrun.aws.json
- Your docker image must be prebuilt and store ECR (for example)
- Single Docker
- Lifecycle policy
- To remove the old version
- There are options to not delete the source bundle in S3 to prevent data lost
- Migration
- When we want to change the config of things we can't change, we need to do migration
- For example, you can't change the network load balancer to application load balancer. If we want to change, we have to:
- Create a new environment with the same configuration except the load balancer
- Deploy the application in the new environment
- Put the new load balancer in
- For RDS separation (to prevent database gets deleted if we delete the elastic beanstalk stack):
- Create snapshot of RDS
- go to RDS console and protect RDS database from deletion
- Create a new beanstalk environment with same config but without the RDS. Point the application to the new exisiting RDS.
- Delete the old application
- For example, you can't change the network load balancer to application load balancer. If we want to change, we have to:
- When we want to change the config of things we can't change, we need to do migration
- Update options
- All at one
- Rolling
- Update a few instances at a time
- Rolling with additional batch
- Same as rolling but go over extra capacity
- Immutable
- Use another Auto Scaling Group for the update
- zero downtime
- Blue/Green deployment
- Zero downtime, create new environment and shift traffic gradually over
- Traffic splitting
- Good for testing split between different auto scaling group
Elasticache
- Redis Cluster Mode enabled
- Good for scale write
- Primary node is spread across multiple shards
- So we can do write on multiple primary nodes
- Have multi-az
- 1 Primary has 0-5 replica node
- Redis Cluster mode disabled
- Cheaper
- 1 Primary has 0-5 replica node
- Write on primary and read on secondary
- Have multi-az
- Caching strategies
- Lazy Loading / Cache-aside / Lazy population
- Only write cache if the cache does not contain information
- Cons: Cache miss latency, data staling if cache keeps missing
- Write through (usually combined with lazy-loading)
- Update the cache when the database is update
- Cons:
- Data is missing if you don't update the database
- Can implement Lazy Loading to avoid this
- Cache churn: waste cache space (a lot might not being used)
- Data is missing if you don't update the database
- Lazy Loading / Cache-aside / Lazy population
- Cache eviction and Time-to-leave
- remove old cache to save some spaces.
- LRU
- If too many evictions happen, we might need to increase cache size
- remove old cache to save some spaces.
SQS
- Dead-letter queue:
- messages that are not processed will go to this queue
- Fanout pattern
- Combination of SNS and SQS
- Queue Type
- Standard
- Can have duplicated message but guarantee at least 1 delivery
- can have out of order message
- FIFO
- name needs to end with
.fifo
- Guarantee to be in-ordered, exactly 1 message
- Using deduplication
- Default is 5 minutes: if you send the same message within 5 minutes, it will be refused
- Detecting the same mssage by content-based deduplication or providing the message deduplication id
- Using deduplication
- name needs to end with
- Standard
- Access policy
- Cross-account access
- Allow other services to send messages to SQS
- APIs
CreateQueue
MessageRetentionPeriod
DeleteQueue
: Delete the whole queuePurgeQueue
: delete all messages onlySendMessage
: (batch support)DelaySeconds
DeleteMessage
(batch support)ReceiveMessage
MaxNumberOfMessages
: from 1 -> 10
ReceiveMesageWaitTimeSeconds
: Long Polling- Wait for message to arrive if non from the queue
- can wait from 1-20 seconds (preferably 20 seconds)
ChangeMessageVisibility
(batch support): message timeout- Note: Batch API can help reducing the cost
- Consumer:
- Receive up to 10 messages at a time
- Producer
- Unlimited throughput
- message persists until the consumer deletes
- Delay Queue
- Set delay messages in the queue up to 15 minutes
- default is 0 seconds
- Can overwrite using
DelaySeconds
- Set delay messages in the queue up to 15 minutes
- Extended Client
- If the message is too big ( > 256 KB), use this which used S3 to send and link message
- Only for
Java
- FIFO Message grouping
- Only for FIFO
- Use
Group ID
to group all the message. - Each
Group ID
can only have 1 consumer - Ordering across group is not guaranteed
- Visibility Timeout
- Invisible to other consumers when polled by a consumer
- Default is 30 seconds
- If within the visibility timeout and you haven't processed the message, message will go back to the queue for other consumer to process
- Invisible to other consumers when polled by a consumer
SNS
- pub/sub can have many receivers
- Up to 12,500,000 subscriptions per topics
- up to 100,000 topics
- Publish type
- Topic Publish (SDK)
- Publish straight to the subscriptions
- Direct Publish
- Publish to a platform endpoint (work with 3rd party)
- Topic Publish (SDK)
- Access Policies
- For cross-account access
- Control which services write to your SNS
- SNS FIFO
- The only subscribers can be SQS FIFO
- name has to end with
.fifo
- ordering by
Message Group ID
- Deduplication using Deduplication ID or content based deduplication
- ordering by
- Message filtering
- Allow subscriber to choose what to receive
- If doesn't specify, receives all messages
Kinesis
- Kinesis Data Stream
- Real-time stream data into your application
- Provisioned:
- select the number of shard, pay per shard per hour
- each shard gets 1 MB/s (or 1000 records per second) in and 2MB/s out
- Capacity
- Pay per stream/hour, data in/out in GB
- 4MB/s in or 4000 records per second
- Scale automatically
- Retention:
- 1-365 days (1 day by default)
- Ability to replay data
- To increase throughput, enable Enhanced mode
- Doesn't work with SNS
- Consumers: subscribe to data
- AWS Lambda
- Support both classic and enhanced fanout consumers
- Kinesis Data Analytics
- Kinesis Data Firehose
- Kinesis Client Library
- Java library that act as a Kinesis consumer
- Each shard is read by 1 Kinesis Client Library instance
- 4 shards = max 4 KCL instances
- 6 shards = max 6 KCL instances
- if needed Enhanced Fan-out consumer, use
v2
- Kinesis Custom Consumers
- Classic Fan-out Consumer
- 2 MB/sec per shard across all consumers
- Use
GetRecord()
API
- Enhanced Fan-out Consumer
- Use
SubscribeToShard()
API - 2 MB/sec for each shard for each consumer
- lower latency but higher cost
- Use
- Classic Fan-out Consumer
- AWS Lambda
- Producer: produce data
- Data consist of
- Sequence number: unique per partition key within shard
- Partition key: (must specified)
- Hash the data to relevant shard
- Can be used to achieve ordering
- Producers
- AWS SDK: Simple
- Kinesis Producer Library (KPL): advanced with compression, retries
- 1MB/sec or 1000 records / sec per shard
- Data consist of
- Kinesis Data Firehose
- Similar to DataStream but fully serverless
- Destinations
- S3
- RedShift
- ElasticSearch
- Near-real time
- 60 seconds latency for non full batches
- 1 MB of data at a time
- Works with SNS
- Kinesis Data Analytics
- Real time query both Data Stream and Data FireHose using SQL
- Fully managed Serverless
- Apache Flink:
- Use Amazon MSK to work with Apache Flink
- Throttle
- Happens when we have more throughput than provisioned
- Retries with exponential backoff
- Reshard / increase shard
- Common operations
- Shard Splitting
- increase stream capacity limit
- divide "hot shard". Increase stream capacity
- Old shard will be close and delete once the data is expired
- Merge shard
- decrease stream capacity to save cost
- Old shard will be close and delete once the data is expired
- can't merge more than 2 shard in single operation
- Shard Splitting
CouldTrail
- Get history of events, api calls within AWS
- Governance, compliance, auditing
- Three types of events
- Management events (default)
- Operations that are related to management to your AWS
- Data events
- S3 Object level activity
- AWS Lambda function execution
- CloudTrail Insights event
- Management events (default)
- CludTrail Insights
- Automatically detect unusual activities using Machine Learning
- analyse normal events to create a base line
CloudWatch
- CloudWatch Metrics
- Provide metrics to your AWS service
- default: every 5 minutes
- with detailed monitoring, can get for every 1 minute
- If want less than 1 minute, need to define Custom Metrics
- Note: EC2 memory usage is not pushed, if you want to push, you have to create Custom Metrics
- Custom metrics:
- Define your own metrics
- Resolution (you can define how often the metrics get pushed)
- Standard: 1 minute
- High resolution: 1/5/10/30 seconds
- If you want CloudWatch alarm can handle 10, 30 or 60 seconds
- CloudWatch Alarms
- Trigger notifications based on metrics
- Can specify an action for each alaram state
- Test the alarm in the CLI using
set-alarm-state
- CloudWatch Events
- Intecepts events from AWS Service
- Create a JSON payuload and send to the target
- Schedule Cron Job
- Intecepts events from AWS Service
- CloudWatch Logs
- log groups:
- Some name to represent the application
- log stream:
- log files,
- Can have expiration
- But nevery expire by default
- Can have encryption at log group level
- Using AWS KMS
- If you wanna use CMK, need to specify in the CLI
associate-kms-key
if log group existscreate-log-group
if it doesn't exist
- Can export the logs to S3 but it's not real time if we want real time we can use CloudWatch Logs Subscription
- CloudWatch Logs Subscription
- Filter that applied on top of cloudwatch
- Good for multi-account logs
- CloudWatch Logs Agent
- By default, no logs from EC2. We need CloudWatch Logs agent for EC2 to have Logs
- Can only monitor CPU, disk, and high level network
- If need RAM and stuff, or more details measurement, need CloudWatch Unified Agent
- CloudWatch Unified Agent
- Updated version of CloudWatch Logs Agent
- Collects RAM, process, Netstat
- Integration with SSM Parameter Store
- CloudWatch Logs Insight
- To query cloudwatch logs
- log groups:
Event Bridge
- Newer CloudWatch Events
- Default Event Bus:
- generated by AWS (similar to CloudWatch Events)
- Partner Event Bus:
- Receive event from third party
- Custom Event Bus:
- Your own thingy
- Ability to replay archived events
- Resource-based policy management access
X-Ray
- Debug, visualise analysis of the applications
- Troubleshoot bottle necks
- X-Ray Sampling:
- Control the amount of data to send to X-Ray
- Default rule:
- reservoir: 1
- at least 1 request is sent to XRay each second as long as service is serving request
- rate: 5%
- 5% of additional requests after reservior are sent as well
- reservoir: 1
- Integration
- ECS, a few options:
- Run X-Ray daemon container in each EC2
- X-Ray as a side car:
- X-Ray in each application container (1 EC2 can have multiple containers)
- If running Fargate, we have to use this one because we dont have access to EC2
- Elastic BeanStalk
- Enable in
.ebextensions/xray-daemon.config
- Not provided for multi-container docker
- Enable in
- ECS, a few options:
- APIS
GetSamplingRules
: retrieve all sampling rulesFor Write excusively
PutTraceSegments
: upload segment to XRayPutTelemetryRecords
: used by daemon
For Read explicitly
GetTraceGraph
GetServiceGraph
GetTraceSummaries
ACM
- Let you provision and manage SSL (Secure Sockets Layer) certs
- Free of charge for public TLS cert
CloudMap
- Create a map of backend services for visualisation
- Register application locations and health status check
Datasync
- Synchronise data in file storage system (not database)
- From on-premises to cloud (need DataSync agent)
- From one AWS to AWS (no need agent)
AWS FIS (Fault Injection Simulator)
- Run fault injection expirements (chaos engineering)
AWS SES (Simple Email Service)
- Send email
Exponential Backoff
- Keep trying, the next tries exponentially double the time as the current try
- Should only retries for 5xx error and 429
S3
- Athena
- Serverless Query on S3 Object
- Support compression or columnar for cost saving
- Bucket Key:
- Optimisation to reduce number of calls to AWS KMS for encryption
- 4 Encryptions methods:
- SSE-S3: encryption using key handled by AWS
- SSE-KMS: use KMS managed key to encrypt objects
- can be optimised using S3 Bucket Key
- SSE-C: client encryption
- The only one that mandates you to use HTTPS
- You use your own key but AWS do the encryption
- client side encryption
- You encrypt yourself
- You can force SSL by explicitly
DENY
if
"Condition": {
"Bool": {
"aws:SecureTransport": "false"
}
}
- Force KMS encryption by
DENY
if
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "aws:kms"
}
}
- Strong consistency: As of 2020, write and read is consistent
NACL (Network ACL - Access Control List)
- Networking/Firewall controls traffic from and to
- Can have
ALLOW
andDENY
- works at subnet level
- Stateless: have to specify in and out traffic
Security group
- Work at instance level
- Can only have
ALLOW
rule - Stateful: traffic returns automatically allowed
VPC Flow Logs
- Capture information about
- VPC
- Subnet
- Elastic Network Interface
VPC Peering
- Connect 2 VPC together
- Have to connect A to B, B to C and A to C
VPC Endpoints
- Allow AWS services to talk to eachother via private network
- Endpoint Gateway: S3 and DynamoDB
- Endpoint Interface: others
Site-Site VPN
- On-premise VPN to AWS
- Does not allow you to connect to VPC endpoint
Direct Connect (DX)
- Physical connection
- Does not allow you to connect to VPC endpoint
Subnet
- is within a VPC
- Public: accessible from internet
- Private: not accessible from internet
Internet Gateway
- Provide Internet to a VPC
NAT Gateway or Nat Instance
- NAT Gateway is serverless
- NAT Instance is created from an AMI on EC2
- Provide connection to internet gateway for instances in private subnet
Data Limitations
- Read write through put
- RCU:
- 1 read = 4 KB
- eventually consistent take 1/2 read
- 1 write = 1 KB
- Transactional takes 2 write
- 1 read = 4 KB
- Kinesis DataStream
- Provisioned:
- Read = 1MB/s or 1000 records/s
- All producer is 1MB/s
- Write = 2MB/s per shard
- Classic consumer: 2MB/s across all consumers
- Enhanced consumer: 2MB/s each consumer
- Read = 1MB/s or 1000 records/s
- Capacity
- Read = 8MB/s or 4000 records/s
- Write = 4 MB/s
- Provisioned:
- RCU:
- Key rotation:
- KMS:
- AWS Managed: rotation every 1 year
- CMK:
- created: must enable auto rotation every 1 year
- imported: manually
- SSM parameter store: No rotations
- Secret manager: can force rotate after X days
- KMS:
- GP2 IOPS to GBs in the rate of 3:1
- 16000 IOPS / 3 = 5334 GB
- IO2 IOPS to GBS in the rate of 50:1
- 100000 IOPS / 50 = 200GB
SAA Stuff
- Placement Groups
- Ways to controls EC2 Placement
- Strategies
- Cluster
- All EC2 into a single rack within an AZ. If it fails, you die
- Good for application that needs extremely low latency
- Spread
- Spread out, 1 rack has 1 instance.
- Maximum 7 instances per AZ
- Good for critical application
- Partition
- Spreads out across different. 1 rack has 1+ instances.
- Maximum 7 instances per AZ
- Good for Hadoop, Kafka,...
- Cluster
- Hibernation↓
- Stop→Data on disk (EBS) is kept
- Terminate→Data on disk (EBS) is lost
- Hibernate
- Machine state is stored in RAM
- Root EBS must be encrypted
- Cannot hibernates more than→60 days
- Instance RAM must be→smaller than 150GB
- Root volume must be→EBS
- EBS
- encryption is regional
- the volume itself is locked to an az
- HDD cannot be boot volume
- Geoproximity Routing: location routing but with bias 1-99
- Minimum day to transfer from S3 Standard / S3 Standard-IA to S3 Standard-IA or S3 One Zone-IA→30 days
- Minimum charge day for S3 Standard-IA and S3 one-zone IA→30 days
- ETL -> Glue
- Amazon Rekognition→find objects, people, facial analysis, text, images, video
- Amazon Transcribe→Speech to text
- Amazon Polly→Text to speech
- Amazon Translate→Translate duh
- Amazon Lex→Same technology as Alexa, convert speed to text. Understanding natural language. To build chatbots and stuff ‒ like dialogflow
- Amazon Connect→Receive calls, create contact flows, cloud-based virtual contact center. 80% cheaper than traditional contact center solutions
- Amazon Comprehend→Natural Language Processing. Analyse customer interaction (emails)
- Amazon SageMaker→Jupyter notebook
- Amazon Kendra→Document search service (text, pdf, html, powerpoint, words,...) to extract answers from within a documents
- Amazon Personalize→Personal recommendation real time
- Amazon textract→automatically extract text and stuff
- Site-site VPN
- Customer side: Customer Gateway
- VPC side: Virtual Private Gateway
- AWS Transit Gateway connects your Amazon Virtual Private Clouds (VPCs) and on-premises networks through a central hub.