ECS Container monitoring using cAdvisor
l

28 Nov, 2020

LinkedInTwitter

In this topic, I will explain how to monitor docker containers running on an ECS cluster. Even though AWS CloudWatch is the preferred tool for monitoring and collecting container metrics, in some scenarios it is required to use alternative solutions.

cAdvisor is an open-source project for understanding the resource usage of running containers.

Metrics collected using cAdvisor can be analyzed using its own web UI or can be exported to various storage drivers. Here I will explain how to use cAdvisor to collect metrics from ECS and ship it to Prometheus for further use.

Prometheus is a widely used open-source tool for monitoring and alerting systems. It can collect metrics from targets and trigger alerts based on conditions and rules evaluation.

But we have CloudWatch?

Yes, CloudWatch may be the easiest solution for metrics collection for ECS. But I was already using Prometheus for storing and alerting metrics from various other systems. So I needed a solution to export the metrics to Prometheus and leverage my well tested and trusted monitoring and alerting ecosystem.
Running cAdvisor cAdvisor can be run either as a docker container or standalone. It is run as an ECS task as there is already an ECS cluster for scheduling and running docker containers.

Create an ECS cluster and a task definition to start with.
ECS cluster creation guide is available here – https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create_cluster.html

Follow this AWS guide to create task definition- https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create-task-definition.html.

A sample task definition is provided below for reference.

{
  "ipcMode": null,
  "executionRoleArn": "arn:aws:iam::123456789012:role/TaskExecutionRole",
  "containerDefinitions": [
    {
      "dnsSearchDomains": null,
      "environmentFiles": null,
      "logConfiguration": null,
      "entryPoint": null,
      "portMappings": [
        {
          "hostPort": 8080,
          "protocol": "tcp",
          "containerPort": 8080
        }
      ],
      "command": null,
      "linuxParameters": null,
      "cpu": 0,
      "environment": [],
      "resourceRequirements": null,
      "ulimits": null,
      "dnsServers": null,
      "mountPoints": [
        {
          "readOnly": true,
          "containerPath": "/rootfs",
          "sourceVolume": "root"
        },
        {
          "readOnly": false,
          "containerPath": "/var/run",
          "sourceVolume": "var_run"
        },
        {
          "readOnly": true,
          "containerPath": "/sys",
          "sourceVolume": "sys"
        },
        {
          "readOnly": true,
          "containerPath": "/var/lib/docker",
          "sourceVolume": "var_lib_docker"
        }
      ],
      "workingDirectory": null,
      "secrets": null,
      "dockerSecurityOptions": null,
      "memory": 256,
      "memoryReservation": null,
      "volumesFrom": [],
      "stopTimeout": null,
      "image": "google/cadvisor",
      "startTimeout": null,
      "firelensConfiguration": null,
      "dependsOn": null,
      "disableNetworking": null,
      "interactive": null,
      "healthCheck": null,
      "essential": true,
      "links": null,
      "hostname": null,
      "extraHosts": null,
      "pseudoTerminal": null,
      "user": null,
      "readonlyRootFilesystem": null,
      "dockerLabels": {
        "PROMETHEUS_EXPORTER_PORT": "8080",
        "PROMETHEUS_EXPORTER_JOB_NAME": "prometheus-ecs-discovery"
      },
      "systemControls": null,
      "privileged": null,
      "name": "cadvisor"
    }
  ],
  "placementConstraints": [],
  "memory": "256",
  "taskRoleArn": "arn:aws:iam::123456789012:role/DefaultTaskRole",
  "compatibilities": [
    "EC2"
  ],
  "taskDefinitionArn": "arn:aws:ecs:us-east-1:123456789012:task-definition/cAdvisor:1",
  "family": "cAdvisor",
  "requiresAttributes": [
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.task-iam-role"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
    },
    {
      "targetId": null,
      "targetType": null,
      "value": null,
      "name": "ecs.capability.task-eni"
    }
  ],
  "pidMode": null,
  "requiresCompatibilities": [
    "EC2"
  ],
  "networkMode": "awsvpc",
  "cpu": "512",
  "revision": 4,
  "status": "ACTIVE",
  "inferenceAccelerators": null,
  "proxyConfiguration": null,
  "volumes": [
    {
      "fsxWindowsFileServerVolumeConfiguration": null,
      "efsVolumeConfiguration": null,
      "name": "root",
      "host": {
        "sourcePath": "/"
      },
      "dockerVolumeConfiguration": null
    },
    {
      "fsxWindowsFileServerVolumeConfiguration": null,
      "efsVolumeConfiguration": null,
      "name": "var_run",
      "host": {
        "sourcePath": "/var/run"
      },
      "dockerVolumeConfiguration": null
    },
    {
      "fsxWindowsFileServerVolumeConfiguration": null,
      "efsVolumeConfiguration": null,
      "name": "sys",
      "host": {
        "sourcePath": "/sys"
      },
      "dockerVolumeConfiguration": null
    },
    {
      "fsxWindowsFileServerVolumeConfiguration": null,
      "efsVolumeConfiguration": null,
      "name": "var_lib_docker",
      "host": {
        "sourcePath": "/var/lib/docker/"
      },
      "dockerVolumeConfiguration": null
    }
  ]
}

Create a new service using this task definition.

Follow AWS guide on service creation. https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create-service.html.

It is important to choose the DAEMON service type as cAdvisor needs to be running on all ECS EC2 instances.

Create an Application Load Balancer to access cAdvisor service, which is listening on port 8080, and attach this ALB to the cAdvisor service. This step is optional and is only required if it is required to access cAdvisor directly.
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-load-balancing.html.

Once the cAdvisor task is started, the Web UI can be accessed using the ALB DNS name.

Shipping metrics to Prometheus

It requires adding the cAdvisor endpoints in Prometheus to ship the metrics exposed by cAdvisor.

As cAdvisor is running as a container, the IP address of the cAdvisor endpoints will be dynamically assigned and will be changed when the task is restarted. This requires Prometheus to discover and register the targets dynamically.

Prometheus Amazon ECS discovery (https://github.com/teralytics/prometheus-ecs-discovery) discovers and registers these dynamic endpoints in Prometheus. It generates the list of cAdvisor endpoints in a file. Prometheus then can utilize the file_sd_config option to read targets from the file. Sample Prometheus config is provided below:

- job_name: ecs
  honor_timestamps: true
  metrics_path: /metrics
  scheme: http
  file_sd_configs:
  - files:
    - /var/lib/prometheus/discovery/ecs_file_sd.yml
    refresh_interval: 1m

It relies on the PROMETHEUS_EXPORTER_PORT label by default to find the docker port where cAdvisor is listening. It is possible to customize this label by passing the -config.port-label option to Prometheus Amazon ECS discovery.

To read the ECS details, AWS credentials can be used as environment variables. Alternatively, an AWS role ARN can be passed using –config.role-arn option.

Full configuration options can be found at https://github.com/teralytics/prometheus-ecs-discovery/blob/master/README.md

Once Prometheus registers these endpoints, they can be found on the target page on Prometheus. The metrics exported by cAdvisor are prefixed with “container_” by default.

Jino John

Jino John

DevOp Engineer

Jiono has 19 years of IT experience of working with both small and large companies. He started his career as a Linux engineer and has looked after IT systems in various Financial Technology companies. Jino has extensive experience in designing and implementing AWS solutions and is an AWS Certified DevOps Engineer – Professional.

Categories

Archives

Related Articles