In this topic, I will explain how to monitor docker containers running on an ECS cluster. Even though AWS CloudWatch is the preferred tool for monitoring and collecting container metrics, in some scenarios it is required to use alternative solutions.
cAdvisor is an open-source project for understanding the resource usage of running containers.
Metrics collected using cAdvisor can be analyzed using its own web UI or can be exported to various storage drivers. Here I will explain how to use cAdvisor to collect metrics from ECS and ship it to Prometheus for further use.
Prometheus is a widely used open-source tool for monitoring and alerting systems. It can collect metrics from targets and trigger alerts based on conditions and rules evaluation.
But we have CloudWatch?
Yes, CloudWatch may be the easiest solution for metrics collection for ECS. But I was already using Prometheus for storing and alerting metrics from various other systems. So I needed a solution to export the metrics to Prometheus and leverage my well tested and trusted monitoring and alerting ecosystem.
Running cAdvisor cAdvisor can be run either as a docker container or standalone. It is run as an ECS task as there is already an ECS cluster for scheduling and running docker containers.
Create an ECS cluster and a task definition to start with.
ECS cluster creation guide is available here – https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create_cluster.html
Follow this AWS guide to create task definition- https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create-task-definition.html.
A sample task definition is provided below for reference.
{
"ipcMode": null,
"executionRoleArn": "arn:aws:iam::123456789012:role/TaskExecutionRole",
"containerDefinitions": [
{
"dnsSearchDomains": null,
"environmentFiles": null,
"logConfiguration": null,
"entryPoint": null,
"portMappings": [
{
"hostPort": 8080,
"protocol": "tcp",
"containerPort": 8080
}
],
"command": null,
"linuxParameters": null,
"cpu": 0,
"environment": [],
"resourceRequirements": null,
"ulimits": null,
"dnsServers": null,
"mountPoints": [
{
"readOnly": true,
"containerPath": "/rootfs",
"sourceVolume": "root"
},
{
"readOnly": false,
"containerPath": "/var/run",
"sourceVolume": "var_run"
},
{
"readOnly": true,
"containerPath": "/sys",
"sourceVolume": "sys"
},
{
"readOnly": true,
"containerPath": "/var/lib/docker",
"sourceVolume": "var_lib_docker"
}
],
"workingDirectory": null,
"secrets": null,
"dockerSecurityOptions": null,
"memory": 256,
"memoryReservation": null,
"volumesFrom": [],
"stopTimeout": null,
"image": "google/cadvisor",
"startTimeout": null,
"firelensConfiguration": null,
"dependsOn": null,
"disableNetworking": null,
"interactive": null,
"healthCheck": null,
"essential": true,
"links": null,
"hostname": null,
"extraHosts": null,
"pseudoTerminal": null,
"user": null,
"readonlyRootFilesystem": null,
"dockerLabels": {
"PROMETHEUS_EXPORTER_PORT": "8080",
"PROMETHEUS_EXPORTER_JOB_NAME": "prometheus-ecs-discovery"
},
"systemControls": null,
"privileged": null,
"name": "cadvisor"
}
],
"placementConstraints": [],
"memory": "256",
"taskRoleArn": "arn:aws:iam::123456789012:role/DefaultTaskRole",
"compatibilities": [
"EC2"
],
"taskDefinitionArn": "arn:aws:ecs:us-east-1:123456789012:task-definition/cAdvisor:1",
"family": "cAdvisor",
"requiresAttributes": [
{
"targetId": null,
"targetType": null,
"value": null,
"name": "com.amazonaws.ecs.capability.task-iam-role"
},
{
"targetId": null,
"targetType": null,
"value": null,
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
},
{
"targetId": null,
"targetType": null,
"value": null,
"name": "ecs.capability.task-eni"
}
],
"pidMode": null,
"requiresCompatibilities": [
"EC2"
],
"networkMode": "awsvpc",
"cpu": "512",
"revision": 4,
"status": "ACTIVE",
"inferenceAccelerators": null,
"proxyConfiguration": null,
"volumes": [
{
"fsxWindowsFileServerVolumeConfiguration": null,
"efsVolumeConfiguration": null,
"name": "root",
"host": {
"sourcePath": "/"
},
"dockerVolumeConfiguration": null
},
{
"fsxWindowsFileServerVolumeConfiguration": null,
"efsVolumeConfiguration": null,
"name": "var_run",
"host": {
"sourcePath": "/var/run"
},
"dockerVolumeConfiguration": null
},
{
"fsxWindowsFileServerVolumeConfiguration": null,
"efsVolumeConfiguration": null,
"name": "sys",
"host": {
"sourcePath": "/sys"
},
"dockerVolumeConfiguration": null
},
{
"fsxWindowsFileServerVolumeConfiguration": null,
"efsVolumeConfiguration": null,
"name": "var_lib_docker",
"host": {
"sourcePath": "/var/lib/docker/"
},
"dockerVolumeConfiguration": null
}
]
}
Create a new service using this task definition.
Follow AWS guide on service creation. https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create-service.html.
It is important to choose the DAEMON service type as cAdvisor needs to be running on all ECS EC2 instances.
Create an Application Load Balancer to access cAdvisor service, which is listening on port 8080, and attach this ALB to the cAdvisor service. This step is optional and is only required if it is required to access cAdvisor directly.
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-load-balancing.html.
Once the cAdvisor task is started, the Web UI can be accessed using the ALB DNS name.
Shipping metrics to Prometheus
It requires adding the cAdvisor endpoints in Prometheus to ship the metrics exposed by cAdvisor.
As cAdvisor is running as a container, the IP address of the cAdvisor endpoints will be dynamically assigned and will be changed when the task is restarted. This requires Prometheus to discover and register the targets dynamically.
Prometheus Amazon ECS discovery (https://github.com/teralytics/prometheus-ecs-discovery) discovers and registers these dynamic endpoints in Prometheus. It generates the list of cAdvisor endpoints in a file. Prometheus then can utilize the file_sd_config option to read targets from the file. Sample Prometheus config is provided below:
- job_name: ecs
honor_timestamps: true
metrics_path: /metrics
scheme: http
file_sd_configs:
- files:
- /var/lib/prometheus/discovery/ecs_file_sd.yml
refresh_interval: 1m
It relies on the PROMETHEUS_EXPORTER_PORT label by default to find the docker port where cAdvisor is listening. It is possible to customize this label by passing the -config.port-label option to Prometheus Amazon ECS discovery.
To read the ECS details, AWS credentials can be used as environment variables. Alternatively, an AWS role ARN can be passed using –config.role-arn option.
Full configuration options can be found at https://github.com/teralytics/prometheus-ecs-discovery/blob/master/README.md
Once Prometheus registers these endpoints, they can be found on the target page on Prometheus. The metrics exported by cAdvisor are prefixed with “container_” by default.
Jino John
DevOp Engineer
Related Articles
Getting started with Kafka Cassandra Connector
If you want to understand how to easily ingest data from Kafka topics into Cassandra than this blog can show you how with the DataStax Kafka Connector.
K3s – lightweight kubernetes made ready for production – Part 3
Do you want to know securely deploy k3s kubernetes for production? Have a read of this blog and accompanying Ansible project for you to run.
K3s – lightweight kubernetes made ready for production – Part 2
Do you want to know securely deploy k3s kubernetes for production? Have a read of this blog and accompanying Ansible project for you to run.