Short writeup about native and common AWS monitoring solutions: CloudWatch, X-Ray, and CloudTrail
- AWS CloudWatch
- AWS X-Ray
- AWS CloudTrail
AWS CloudWatch
AWS CloudWatch Metrics
- CloudWatch provides metrics for almost all the services in AWS
- “Metric” is a variable to monitor (CPUUtilization, NetworkIn, …)
- Metrics belong to “namespaces”
- “Dimension” is an attribute of a metric (instance id, environment, etc…)
- Up to 10 dimensions per metric
- Metrics have “timestamps”
- Can create CloudWatch dashboards of metrics
AWS CloudWatch EC2 Detailed monitoring
- EC2 instance metrics have metrics “every 5 mins”
- With detailed monitoring (for a cost), you get data “every 1 min”
-
Use detailed monitoring if you want to more prompt scale your ASG!
- Note: EC2 Memory usage is by default not pushed (must be pushed from inside the instance as a custom metric)
AWS CloudWatch Custom Metrics
- Possibility to define and send your own custom metrics to CloudWatch
- Ability to use dimensions (attributes) to segment metrics
- Instance.id
- Environment.name
- Metric resolution
- Standard: 1 minute
- High Resolution: up to 1 second (StorageResolution API Parameter) - lead to Higher cost
- Use API call “PutMetricData”
- Use exponential back off in case of throttle errors if talking to gibberish with the Management API
AWS CloudWatch Alarms
- Alarms are used to trigger notifications for any metric
- Alarms can go to Auto Scaling, EC2 Actions, SNS notifications
- Various options (sampling, %, max, min, etc…)
- Alarm states:
OK, INSUFFICIENT_DATA, ALARM
- Period:
- Length of time in seconds to evaluate the metric
- High resolution custom metrics: can only choose 10 secs or 30 secs
AWS CloudWatch Logs
- Applications can send logs to CloudWatch using the SDK
- CloudWatch can collect logs from:
- Elastic Beanstalk: collection of logs from application
- ECS: collection from containers
- AWS Lambda: collection from function logs
- VPC Flow Logs: VPC specific logs
- API Gateway
- CloudTrail based on filter
- CloudWatch log agents: for example on EC2 machines
- Route53: Log DNS queries
- CloudWatch logs can go to:
- Batch exporter to S3 for archival
- Stream to ElasticSearch cluster for further analytics
CloudWatch Logs for EC2
- By default, no logs from your EC2 machine will go to CloudWatch
- You need to run a CloudWatch agent on EC2 to push the log files you want
CloudWatch Logs Agent & Unified Agent
- Both are for virtual servers (EC2 instances, on-premise servers)
- CloudWatch Logs Agent
- Old version of the agent
- Can only send to CloudWatch Logs
- CloudWatch Unified Agent
- Collect additional system-level metrics such as RAM, processes, etc
- Collect logs to send to CloudWatch Logs
- Centralized configuration using SSM Parameter Store
CloudWatch Logs Metric Filter
- CloudWatch Logs can use filter expressions
- For example, find a specific IP inside of a log
- Or count occurrences of “ERROR” in your logs
- Metric filter can be used to trigger alarms then
- Filters do not retroactively filter data. Filters only publish the metric data points for events that happen after the filter was created.
AWS CloudWatch Events
- Schedule: Cron jobs
- Event Pattern: Event rules to react to a service doing something
- Example: CodePipeline state changes!
- Triggers to Lambda functions, SQS/SNS/Kinesis Messages
- CloudWatch Event creates a small JSON document to give information about the change
Amazon EventBridge
- EventBridge is the next evolution of CloudWatch Events
- Default event bus: generated by AWS services (CloudWatch Events)
- Partner event bus: receive events from SaaS service or applications (Zendesk, DataDog, Segment, Auth0, …)
- Custom event buses: for your own applications
-
Event buses can be accessed by other AWS accounts
- Rules: how to process the events (similar to CloudWatch Events)
Amazon EventBridge Schema Registry
- EventBridge can analyze events in your bus and infer the schema
- The Schema Registry allows you to generate code for your application that will know in advance how data is structured in the event bus
- Schema can be versioned
Amazon EventBridge vs CloudWatch Events
- Amazon EventBridge builds upon and extends CloudWatch Events
- It uses the same service API and endpoint, and the same underlying service infrastructure
- EventBridge allows extension to add event buses for your custom applications and your third-party SaaS apps
-
EventBridge has the Schema Registry capability
- EventBridge has a different name to mark the new capabilities
- Over time, the CloudWatch Events name will be replaced with EventBridge
AWS X-Ray
- Debugging in Production, the good old way:
- Test locally
- Add log statements everywhere
- Re-deploy in production
- Log formats differ across applications using CloudWatch and analytics is hard
- Debugging: monolith “easy”, distributed services “hard”
- No common views of your entire architecture!
AWS X-Ray advantages
- Troubleshooting performance (bottlenecks)
- Understand dependencies in a microservice architecture
- Pinpoint service issues
- Review request behavior
- Find errors and exceptions
- Are we meeting time SLA?
- Where am I throttled?
- Identify users that are impacted
AWS X-Ray leverages “Tracing”
- Tracing is an end-to-end solution to follow a “request” across multiple hops > SPANS
- Each component dealing with request adds its own “trace”
- Tracing is made of segments (+ sub segments)
- Annotations can be added to traces to provide extra-information
- Ability to trace:
- Every request
- Sample request (as a & for example or rate/min)
- X-Ray Security
- IAM for authorization
- KMS for encryption at rest
How to enable AWS X-Ray?
- Your code must import the AWS X-Ray SDK
- Very little modification needed
- The application SDK will then capture:
- Calls to AWS services
- HTTP / HTTPS requests
- Database calls (MySQL, PostgreSQL, DynamoDB)
- Queue calls (SQS)
- Install the X-Ray daemon or enable X-Ray AWS Integration
- X-Ray daemon works as a low-level UDP packet interceptor (Linux, Windows, Mac)
- AWS Lambda / other AWS services already run the X-Ray daemon for you
- Each application must have the IAM rights to write data to X-Ray
AWS X-Ray Troubleshooting
- If X-Ray is not working on EC2
- Ensure the EC2 IAM Role has the proper permissions
- Ensure the EC2 instance is running the X-Ray Daemon
- To enable on AWS Lambda:
- Ensure it has an IAM execution role with proper policy (AWSX-RayWriteOnlyAccess)
- Ensure that X-Ray is imported in the code
X-Ray Instrumentation in your code
- Instrumentation means the measure of product’s performance, diagnose errors, and to write trace information
X-Ray Concepts
- Segments: Each application / service will send them
- Sub-segments: If you need more details in your segment
- Trace: segments collected together to form an end-to-end trace
- Sampling: decrease the amount of requests sent to X-Ray, reduce cost
- Annotations: Key-value pairs used to index traces and use with filters
-
Metadata: Key-value pairs, not indexed, not used for searching
- The X-Ray daemon / agent has a config to send traces cross account
- make sure the IAM permission are correct - the agent will assume the role
- This allows to have a central account for all your application tracing
X-Ray Sampling Rules
- With sampling rules, you control the amount of data that you record
-
You can modify sampling rues without changing your code
-
By default, the X-Ray SDK records the first request “each second”, and “five percent” of any additional requests
- One request per second is the “reservoir”, which ensures that at least one trace is recorded each second as long as the service is serving requests
- Five percent is the “rate”, at which additional requests beyond the reservoir size are sampled
AWS CloudTrail
- Provides governance, compliance and audit for your AWS account
- CloudTrail is enabled by default
- Get an history of events / API calls made within your AWS account by:
- Console
- SDK
- CLI
- AWS Services
- Can put logs from CloudTrail into CloudWatch Logs
- If a resource is deleted in AWS, look into CloudTrail first.