Elastic Container Service (ECS): My default choice for containers on AWS

Elastic Container Service (ECS): My default choice for containers on AWS

·24 min read ·

Amazon Elastic Container Service is the default AWS service I reach for whenever I need to run containers. Whether it's a batch processing pipeline that fans out across hundreds of Fargate tasks or a FastAPI backend sitting behind an Application Load Balancer, ECS handles the orchestration without the operational complexity of Kubernetes. The control plane is free, the AWS integration is deep, and as of early 2026, the deployment capabilities rival anything in the container ecosystem.

I recently presented on ECS and decided to write down the things I have learned from building real projects into one place. This blog post is the companion to that presentation - a deep dive into what ECS offers, how I use it, and how you can start building with it today.


Why Containers?

Before we talk about ECS specifically, let's talk about why containers matter. Four core principles make containers compelling:

  • Consistency - The same image runs identically on your laptop, in CI, and in production. No more "works on my machine."
  • Isolation - Each container gets its own filesystem, networking, and process space. Multiple services on the same host without conflicts.
  • Efficiency - Containers share the host OS kernel. Startup in seconds, not minutes. Far less overhead than virtual machines.
  • Portability - A Docker image runs on ECS, EKS, Lambda, or your own servers. Your business logic stays runtime-agnostic.

In my Aurora DSQL Kabob Store project, I made this a deliberate design decision - keeping business logic runtime-agnostic so the same FastAPI application could deploy on Fargate, EC2, EKS, or Lambda with minimal adapter code.


What is Amazon ECS?

ECS is a fully managed container orchestration service. You define what to run and how, and ECS handles placement, scaling, availability, and integration with the rest of AWS.

ECS Overview

Four things make ECS stand out:

  1. No control plane cost - Unlike EKS (~$75/month per cluster), the ECS orchestration layer is completely free. You only pay for the compute your containers use.
  2. Deep AWS integration - IAM roles per task, CloudWatch Container Insights, native ALB target groups, Secrets Manager injection, and tight integration with every major AWS service.
  3. Flexible compute - Choose between Fargate (serverless), EC2 (self-managed), or the new Managed Instances (AWS-managed EC2).
  4. Deployment sophistication - Rolling updates, native blue/green, canary, and linear deployments all built in.

Core Concepts

Five building blocks make up ECS:

Cluster - A logical grouping of tasks and services. Think of it as your namespace. A cluster can span Fargate, EC2, and Managed Instances simultaneously.

Task Definition - The blueprint. A JSON document that specifies container images, CPU, memory, networking mode, volumes, IAM roles, and logging configuration. Versioned - each registration creates a new revision (e.g., my-app:3).

Task - A running instance of a task definition. One or more containers working together. On Fargate, each task gets its own elastic network interface and private IP.

Service - Maintains a desired count of tasks. Handles replacement of failed tasks, load balancer registration, auto scaling, and deployments.

Container Instance - An EC2 instance running the ECS agent, registered to a cluster. Only relevant if you're using the EC2 launch type.


Compute Options

ECS gives you four ways to provide compute for your containers. Choosing the right one depends on your workload characteristics, cost sensitivity, and operational preferences.

ECS Compute Options

Fargate (Serverless) - The Default Choice

Fargate is what I recommend for most workloads. With it, there are no EC2 instances to manage. You specify CPU and memory at the task level, and AWS handles everything underneath.

CPU (vCPU)Memory OptionsNotes
0.25512 MiB, 1-2 GBLinux only
0.51-4 GBLinux only
12-8 GBLinux and Windows
24-16 GBLinux and Windows
48-30 GBLinux and Windows
816-60 GBLinux only
1632-120 GBLinux only

Fargate supports both x86_64 and ARM64 (Graviton) architectures. Graviton gives you roughly 20% better price-performance for most workloads. Pricing is per-second based on vCPU and memory consumed.

Fargate Spot offers up to 70% savings for fault-tolerant workloads. When AWS reclaims capacity, tasks receive a SIGTERM with a 2-minute warning. I use this for batch processing jobs where interruption just means retrying one file.

EC2 Launch Type

This offers full control over the underlying instances. You choose the AMI, instance type, and manage patching and scaling yourself. Choose EC2 when you need GPUs, custom AMIs, specific instance families, or when sustained high utilization makes reserved instances cheaper than Fargate.

The trade-off is clear: more control, more operational burden.

ECS Managed Instances

Launched in September 2025, Managed Instances bridge the gap between Fargate simplicity and EC2 flexibility. AWS handles provisioning, auto-scaling, Bottlerocket OS patching (14-day cycles), and host replacement. You control instance type selection via attribute-based selection - say "I need 4 GPUs" and ECS picks the right instance.

The "start before stop" principle for host replacement is particularly nice - new capacity comes up before old goes down, maintaining availability throughout.

This is the answer for GPU workloads and ML inference where Fargate isn't an option but you don't want to manage EC2 fleets.

Capacity Providers

Capacity providers are the recommended way to configure compute. The strategy uses two parameters:

  • Base - Minimum tasks guaranteed on a specific provider (only one provider can have a base)
  • Weight - Relative proportion of tasks after the base is filled

Example: base 2 on FARGATE, weight 4 on FARGATE_SPOT, weight 1 on FARGATE. Your first 2 tasks are guaranteed to use on-demand Fargate. After that, 4 out of every 5 new tasks go to Spot. Cost optimization with a reliability floor.


Task Definitions - The Blueprint

The task definition is where you define everything about your containers. Here are the critical parameters:

ECS Task Definition Anatomy

Container Definitions

Each task definition contains one or more container definitions. Key parameters include:

  • image - Docker image from ECR (Elastic Container Registry), Docker Hub, or any private registry
  • essential - If an essential container stops, the entire task stops. Your main app is essential; your log router sidecar might not be
  • portMappings - Container ports, with named ports for Service Connect
  • healthCheck - CMD-SHELL command with configurable interval, timeout, retries, and start period
  • dependsOn - Container startup ordering with conditions: START, COMPLETE, SUCCESS, HEALTHY
  • restartPolicy - Container-level restarts without killing the entire task. Configurable attempt period (60-1800 seconds) and ignored exit codes

Task Role vs Execution Role

This distinction trips people up:

Task RoleExecution Role
PurposePermissions for your application codePermissions for the ECS agent
Used byYour containers calling AWS APIsECS pulling images, pushing logs, fetching secrets
ExampleS3 read/write, DynamoDB accessecr:GetAuthorizationToken, logs:CreateLogStream, secretsmanager:GetSecretValue

Two different roles, two different purposes. The task role follows least privilege for your application. The execution role is about infrastructure plumbing.

Secrets Injection

ECS natively injects secrets as environment variables from Secrets Manager or SSM Parameter Store:

{
  "secrets": [
    {
      "name": "DB_PASSWORD",
      "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789:secret:my-secret"
    },
    {
      "name": "API_KEY",
      "valueFrom": "arn:aws:ssm:us-east-1:123456789:parameter/my-param"
    }
  ]
}

Secrets Manager supports specific JSON keys (arn:...secret:my-secret:username::) and version staging. Never bake secrets into container images or task definitions.


Networking

ECS Networking with awsvpc Mode

Networking Modes

ModeDescriptionUse With
awsvpcEach task gets its own ENI and private IP. Per-task security groups. Required for Fargate.Fargate, EC2, Managed Instances
bridgeDocker's virtual network. Dynamic port mapping with ALB.EC2 only
hostContainers use host's network directly. No port isolation.EC2 only
noneNo external networking.EC2 only

Recommendation: Use awsvpc unless you have a specific reason not to. It's the only mode that works everywhere and gives you per-task security groups.

VPC Architecture

For production workloads, run ECS tasks in private subnets. Use VPC endpoints for ECR, S3, and CloudWatch to avoid NAT gateway data transfer costs. This is the biggest hidden cost in ECS architectures - NAT gateways charging for every image pull and log push.

Service Connect

Service Connect is the recommended way to handle service-to-service communication. It automatically injects an Envoy proxy as a sidecar, providing:

  • Service discovery via Cloud Map namespaces
  • Client-side load balancing with retries and outlier detection
  • Free application-level traffic metrics in CloudWatch (request count, latency, error rates)
  • Support for HTTP, HTTP2, gRPC, and TCP
  • Per-request Envoy access logs (October 2025)

ECS Service Connect

Service Connect replaces AWS App Mesh, which will be discontinued in September 2026.


Deployment Strategies

ECS has the most sophisticated deployment options of any container orchestrator on AWS. As of March 2026, four strategies are available natively.

ECS Deployment Strategies

Rolling Update (Default)

Gradually replaces old tasks with new ones. Controlled by:

  • minimumHealthyPercent (default 100%) - Minimum tasks that must remain running
  • maximumPercent (default 200%) - Maximum tasks allowed during deployment

For zero-downtime with a desired count of 4: min 100%, max 200% means ECS starts 4 new tasks, waits for them to be healthy, then stops the 4 old tasks.

Blue/Green

Built into ECS without CodeDeploy dependency. Provisions 100% new capacity ("green"), validates, then shifts all production traffic at once. Key features:

  • Six Lambda lifecycle hooks: pre-scale-up, post-scale-up, test traffic shift, production traffic shift, post-test, post-production
  • Configurable bake time for instant rollback window
  • Works with ALB, NLB, and Service Connect

Canary (October 2025)

Two-stage deployment: shift a small percentage of traffic first (configurable from 0.1% to 99.9%), validate with real production traffic, then shift the rest. Ideal for critical user-facing services where you want to limit blast radius.

Linear (October 2025)

Gradual traffic shift in equal increments as small as 3%, with configurable bake time between each step. The most conservative approach - allows monitoring at each increment.

Deployment Circuit Breaker

The safety net across all strategies. If tasks keep failing to start or pass health checks, ECS automatically stops the deployment and optionally rolls back to the last successful version. You can wire CloudWatch Alarms into the circuit breaker to detect application-level failures, not just infrastructure failures.


Service Auto Scaling

ECS uses Application Auto Scaling with four policy types:

  1. Target Tracking - Set a target metric value (e.g., CPU at 50%). Simplest to configure - works like a thermostat.
  2. Step Scaling - Define explicit threshold/action pairs. React differently at different severity levels.
  3. Scheduled Scaling - Time-based. Scale up for business hours, down at night. Supports scaling to zero (set minimum capacity to 0).
  4. Predictive Scaling - ML-based. Analyzes historical patterns and proactively scales before demand hits. Doesn't trigger scale-ins on its own - pair with target tracking.

Important behavior: scale-in is automatically paused during deployments to protect availability.


Storage Options

StoragePersistenceSharedUse Case
EphemeralTask lifetimeWithin taskTemp files, caches. Default 20 GiB, up to 200 GiB on Fargate
EFSPersistentAcross tasksShared config, models, content. Multi-AZ, IAM auth
EBSConfigurableSingle taskHigh-IOPS data processing. One volume per task
Bind MountsTask lifetimeWithin taskContainer-to-container data sharing

EFS is the most versatile - persistent, shared across tasks, supports IAM authorization and transit encryption. EBS is for high-performance block storage when EFS throughput is insufficient. One gotcha with EFS though is that it can take minutes to create a new EFS but this is typically a one time thing for a given application.


Security

Security in ECS follows the principle of least privilege at the task level:

  • Task roles - Each task definition gets its own IAM role. Your batch processor gets S3 and SQS access. Your API gets DynamoDB access. Not a shared instance profile.
  • Secrets injection - Secrets Manager and SSM Parameter Store values injected as environment variables at startup.
  • Network isolation - awsvpc mode gives each task its own security group. Run tasks in private subnets.
  • Read-only root filesystem - Run containers with readonlyRootFilesystem: true for hardening.
  • Image scanning - ECR enhanced scanning with Amazon Inspector continuously scans for OS and language package vulnerabilities. As of 2026, it supports minimal base images like scratch and distroless, and shows which images are running in your clusters.

Observability

CloudWatch Container Insights (Enhanced)

Container Insights provides granular metrics at the cluster, service, task, and container level. The honeycomb visualization gives you cluster health at a glance - alarm state and utilization side by side. Deployment tracking alongside infrastructure anomalies. Cross-account monitoring for unified views. Can be enabled per-cluster or account-wide.

Logging

The awslogs driver sends container logs directly to CloudWatch Logs. As of June 2025, the default mode switched from blocking to non-blocking - if the log buffer fills up, excess logs are dropped rather than blocking your application. This prioritizes availability over logging completeness.

For advanced log routing, FireLens with Fluent Bit as a sidecar routes logs to any destination - CloudWatch, S3, Elasticsearch, Datadog, Splunk. Different containers can route to different destinations.

Tracing

Deploy the AWS Distro for OpenTelemetry (ADOT) collector as a sidecar. It receives OTLP traces on port 4317 (gRPC) or 4318 (HTTP) and exports to X-Ray automatically. This replaces the legacy X-Ray daemon approach.

ECS Exec

ECS Exec lets you shell into a running container directly - the equivalent of docker exec but for tasks running on Fargate or EC2. It uses AWS Systems Manager (SSM) under the hood, so there's no need to open inbound ports or SSH. I use this all the time - it's one of the most useful ECS features IMO.

To enable it, set enableExecuteCommand: true on your service or run task call. Then:

aws ecs execute-command \
  --cluster my-cluster \
  --task abc123 \
  --container my-app \
  --interactive \
  --command "/bin/sh"

This is invaluable for debugging - inspecting environment variables, checking network connectivity, verifying mounted volumes, or tailing logs inside the container. A few things to keep in mind:

  • It must be enabled before the task launches - you can't retroactively enable it on already-running tasks. For services, enabling it requires a new deployment
  • The task role needs SSM permissions (ssmmessages:CreateControlChannel, ssmmessages:CreateDataChannel, ssmmessages:OpenControlChannel, ssmmessages:OpenDataChannel)
  • The container image needs a shell (/bin/sh or /bin/bash) - scratch and distroless images won't work
  • All sessions are logged to CloudWatch or S3 for audit
  • Works with both Fargate and EC2 launch types

For quick diagnostics, the amazon-ecs-exec-checker script validates that your task, role, and agent are configured correctly.


Real-World Architecture: Batch Processing with Fargate

My Serverless Data Processor project demonstrates the batch processing pattern.

Batch Processing with Step Functions and Fargate

The architecture: S3 upload triggers a Lambda that extracts files. Step Functions distributed map fans out processing across Fargate tasks - each file gets its own container. The containers use the waitForTaskToken pattern - Step Functions passes a callback token as an environment variable, the Rust container processes the data, then calls send_task_success to signal completion.

Key details:

  • Fargate at minimum specs: 0.25 vCPU, 512 MB RAM
  • OpenTelemetry sidecar for CloudWatch metrics
  • Container images in ECR with multi-stage Docker builds
  • Written in Rust for the worker containers
  • Infrastructure managed with Terraform

Here is the Step Functions integration that launches Fargate tasks:

{
  "Type": "Task",
  "Resource": "arn:aws:states:::ecs:runTask.waitForTaskToken",
  "Parameters": {
    "LaunchType": "FARGATE",
    "Cluster": "${ecs_cluster}",
    "TaskDefinition": "${task_def_name}",
    "NetworkConfiguration": {
      "AwsvpcConfiguration": {
        "Subnets": ["${fargate_subnet}"],
        "SecurityGroups": ["${vpc_default_sg}"]
      }
    },
    "Overrides": {
      "ContainerOverrides": [{
        "Name": "store_data_processor_daily",
        "Environment": [
          {"Name": "TASK_TOKEN", "Value.$": "$$.Task.Token"},
          {"Name": "S3_BUCKET", "Value.$": "$.BatchInput.source_bucket_name"},
          {"Name": "S3_KEY", "Value.$": "$.Items[0].Key"}
        ]
      }]
    }
  }
}

This pattern works for any fan-out workload - ETL, media processing, report generation, ML batch inference. Each task is independent, starts in seconds, processes its data, and terminates. You pay only for the compute time used.

I also used this same Fargate + callback token pattern in my Serverless Pizza Ordering project, where the Fargate container simulated pizza preparation and delivery - chosen over Lambda because the "AI" insisted some pizzas could take more than 15 minutes.


Real-World Architecture: API Backend on Fargate

My Aurora DSQL Kabob Store project uses ECS Fargate as an always-on API backend.

FastAPI Backend on ECS Fargate

React frontend behind CloudFront, Application Load Balancer routing to FastAPI containers on Fargate, connecting to Aurora DSQL for multi-region active-active writes.

The key design decision: keep business logic runtime-agnostic. The same FastAPI application uses direct psycopg2 queries (not ORM) so it can deploy across Fargate, ECS on EC2, Lambda, or EKS with minimal adapter code.

Fargate costs about 20-30% more than equivalent EC2 on-demand for sustained workloads, but the operational simplicity during development is worth it. In practice, real migrations from Fargate to EC2 often yield smaller savings than expected - Tines reported only ~5% compute cost savings after migrating, though they saw 30% faster job processing and 10% lower P95 latency from having dedicated hardware. The biggest cost was actually the VPC infrastructure - NAT gateways at ~$2-3/day - not ECS itself. I terraform destroy when not actively developing.


Terraform Examples

Here is the core Terraform for an ECS Fargate setup, taken from my Serverless Data Processor project:

Cluster

resource "aws_ecs_cluster" "ecs_cluster" {
  name = "${var.project_name}-cluster"

  setting {
    name  = "containerInsights"
    value = "enabled"
  }
}

Task Definition

resource "aws_ecs_task_definition" "fargate_processor_task" {
  family                   = var.task_definition_name
  execution_role_arn       = aws_iam_role.ecs_task_execution_role.arn
  task_role_arn            = aws_iam_role.ecs_task_role.arn
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = var.fargate_cpu
  memory                   = var.fargate_memory

  container_definitions = templatefile(
    "${path.module}/container-definitions.json.tpl",
    {
      app_image           = var.app_image
      fargate_cpu         = var.fargate_cpu
      fargate_memory      = var.fargate_memory
      aws_region          = var.aws_region
      project_name        = var.project_name
      task_container_name = var.task_container_name
    }
  )
}

Container Definition Template

[
  {
    "cpu": ${fargate_cpu},
    "essential": true,
    "image": "${app_image}",
    "memory": ${fargate_memory},
    "name": "${task_container_name}",
    "networkMode": "awsvpc",
    "environment": [
      {"name": "S3_BUCKET", "value": "my-bucket"},
      {"name": "S3_KEY", "value": "data/input.json"}
    ],
    "logConfiguration": {
      "logDriver": "awslogs",
      "options": {
        "awslogs-group": "/ecs/${project_name}",
        "awslogs-region": "${aws_region}",
        "awslogs-stream-prefix": "${project_name}-log-stream"
      }
    }
  },
  {
    "image": "public.ecr.aws/aws-observability/aws-otel-collector:latest",
    "name": "aws-otel-collector",
    "essential": false,
    "command": ["--config=/etc/ecs/ecs-cloudwatch.yaml"],
    "logConfiguration": {
      "logDriver": "awslogs",
      "options": {
        "awslogs-group": "/ecs/${project_name}-otel",
        "awslogs-region": "${aws_region}",
        "awslogs-stream-prefix": "otel"
      }
    }
  }
]

ECS Service with ALB (for always-on workloads)

resource "aws_ecs_service" "api_service" {
  name            = "${var.project_name}-service"
  cluster         = aws_ecs_cluster.ecs_cluster.id
  task_definition = aws_ecs_task_definition.api_task.arn
  desired_count   = var.desired_count
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = var.private_subnet_ids
    security_groups  = [aws_security_group.ecs_tasks.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.api.arn
    container_name   = var.container_name
    container_port   = var.container_port
  }

  deployment_circuit_breaker {
    enable   = true
    rollback = true
  }

  depends_on = [aws_lb_listener.api]
}

Three resources for the core ECS setup: cluster, task definition, and service. The container definition template handles the application specifics. The full Terraform for both projects is in the GitHub repos linked at the end.


Recent Features (2025-2026)

ECS has had a remarkable year of feature launches:

  • AI Developer Tools (December 2025) - ECS MCP Server for AI-assisted development and operations. Natural language commands for cluster management.
  • ECS Express Mode (November 2025) - Deploy a production-ready containerized web app with just three inputs: a container image, a task execution role, and an infrastructure role. Provisions Fargate, ALB with SSL, auto scaling, monitoring, and a unique URL. Up to 25 services can share one ALB. No additional charge beyond the underlying resources.
  • Canary and Linear Deployments (October 2025) - Fine-grained traffic shifting. Canary from 0.1% to 99.9%, linear in increments as small as 3%.
  • Service Connect Envoy Access Logs (October 2025) - Per-request telemetry for HTTP, HTTP2, gRPC, and TCP.
  • ECS Managed Instances (September 2025) - AWS-managed EC2 with Bottlerocket OS. Attribute-based instance selection for GPUs and specialized hardware.
  • Native Blue/Green Deployments (July 2025) - Built into ECS without CodeDeploy. Six Lambda lifecycle hooks for testing and approval at each phase. Configurable bake time for instant rollback.
  • Non-Blocking Log Driver Default (June 2025) - Prioritizes task availability over logging completeness.

ECS vs EKS - When to Use What

This is the most common question I get. Both solve the same fundamental problem - running containers reliably at scale.

CriteriaECSEKS
Control plane costFree~$75/month per cluster
Learning curveAWS-native conceptsKubernetes concepts
AWS integrationDeep, nativeGood, via add-ons
Multi-cloudAWS onlyPortable K8s manifests
EcosystemAWS toolingHelm, ArgoCD, Istio, operators
Managed computeFargate, Managed InstancesFargate, managed node groups

Choose ECS when your team is AWS-focused, you want operational simplicity, you value the free control plane, and your workloads are straightforward services and batch jobs.

Choose EKS when your team knows Kubernetes, you need multi-cloud portability, you want the Kubernetes ecosystem (Helm, ArgoCD, custom operators), or you're running complex stateful workloads.

Most organizations pick based on team expertise and existing tooling, not technical limitations.


ECS vs Lambda - Containers vs Functions

Another comparison that comes up frequently:

CriteriaECS/FargateLambda
Max durationUnlimited15 minutes
Max memory120 GB10 GB
StartupSeconds (image pull)Milliseconds (warm) to seconds (cold)
PricingPer-second (vCPU + memory)Per-invocation + duration
ScalingService auto scalingAutomatic per-request
Best forLong-running, resource-heavy, always-onEvent-driven, short-lived, bursty

In my projects I use both. Lambda for event handling - S3 triggers, API endpoints, file extraction. Fargate for heavy processing - data transformation, ML inference, container workloads that need full runtime control.


Best Practices

After building several production systems with ECS, here are the practices I've found most valuable:

  1. Use Fargate unless you need GPUs or specific instance types. The operational simplicity is worth the cost premium for most workloads.
  2. Use awsvpc networking mode everywhere. It's the only mode that works on all compute types and gives you per-task security groups.
  3. Enable the deployment circuit breaker with rollback. This catches failed deployments before they impact all traffic.
  4. Use capacity provider strategies to mix Spot and on-demand. A base of on-demand with weighted Spot gives you cost savings with a reliability floor.
  5. Inject secrets via Secrets Manager. Never bake them into images or pass them as plain environment variables.
  6. Enable Container Insights. The per-task metrics and honeycomb visualization are invaluable for debugging.
  7. Use Service Connect for service-to-service communication. Free traffic metrics and managed Envoy proxies with no code changes.
  8. Use VPC endpoints for ECR, S3, and CloudWatch. NAT gateway data transfer costs are the biggest hidden expense in ECS architectures.
  9. Use multi-stage Docker builds. Keep images small. A Rust binary in a scratch image is a few megabytes. A Python app in a slim image with only production dependencies.
  10. Define health checks in the task definition. Don't rely solely on ALB health checks - container-level health checks catch issues faster.

Pricing - You Pay for Compute, Not Orchestration

The most important thing to know: ECS orchestration is free. You only pay for the compute resources your containers consume.

  • Fargate - Per-second billing for vCPU ($0.04048/hour) and memory ($0.004445/GB/hour). Spot is up to 70% less. Compute Savings Plans can reduce costs by up to 49% (3-year all-upfront) or ~20% (1-year no-upfront).
  • EC2 - Standard instance pricing. Use Savings Plans or Reserved Instances for sustained workloads.
  • Managed Instances - EC2 instance pricing plus a management fee for automated provisioning, patching, and host replacement.

Hidden costs to watch:

  • NAT gateways - $0.045/GB for data processed. Use VPC endpoints.
  • ALB - Fixed hourly cost plus per-LCU. Up to 25 ECS Express Mode services can share one ALB.
  • ECR storage - $0.10/GB/month. Use lifecycle policies to clean up old images.
  • Ephemeral storage - Fargate charges $0.000111/GB/hour above the default 20 GiB.

Cost optimization strategies:

  • Right-sizing - The single biggest lever. Reducing from 1 vCPU/2GB to 0.5 vCPU/1GB can yield ~45-50% lower Fargate task cost.
  • Scheduled shutdowns - Running dev/staging environments only during business hours (8 hours/day, 5 days/week) can reduce costs by over 75%.
  • Savings Plans - Commit to consistent usage for 1-3 years. Even no-upfront 1-year plans save ~20% on Fargate.

For my projects, the ECS cost has been minimal. The batch processor runs tasks for seconds at minimum specs. The Kabob Store's main cost was VPC infrastructure ($2-3/day), not ECS.


Things to Know

A few operational details worth keeping in mind:

  • Task placement - Fargate handles placement automatically. For EC2, use the binpack placement strategy to consolidate workloads on fewer instances and reduce waste.
  • Task recycling - Fargate tasks on platform version 1.4.0+ are recycled after 14 days of continuous running. Your service will gradually replace old tasks.
  • ENI limits - In awsvpc mode on EC2, each task needs an ENI. Enable ENI trunking to increase density (requires CloudFormation custom resources).
  • Image pull time - Large images slow task startup. Keep images lean. Set ECS_IMAGE_PULL_BEHAVIOR=prefer-cached on EC2 instances to use cached images when available.
  • Spot instance draining - For EC2 Spot instances, set ECS_ENABLE_SPOT_INSTANCE_DRAINING=true on the ECS agent for graceful task termination.
  • Service quotas - Default Fargate vCPU quota is 6 on new accounts (up to 4,000 in production). Request increases proactively.
  • Force new deployment - If you update a secret or parameter store value, the running tasks won't pick it up automatically. Force a new deployment to refresh.

Wrapping Up

ECS is the container orchestration service I use most on AWS. The free control plane, deep AWS integration, and flexible compute options make it the right choice for most container workloads that don't require Kubernetes-specific tooling.

The recent feature launches have been particularly impressive - native blue/green without CodeDeploy, canary and linear deployments, Managed Instances for GPU workloads, and Express Mode for rapid prototyping. Combined with Fargate's serverless simplicity and Service Connect's built-in service mesh, ECS has matured into a comprehensive platform for running containers at any scale.

I've used it for batch data processing with Step Functions fan-out, pizza ordering with long-running container workflows, and multi-region API backends with Aurora DSQL. In every case, ECS handled the orchestration cleanly while I focused on the application logic.

If you're running containers on AWS and haven't looked at ECS recently, the current feature set is worth a fresh evaluation. Start with a Fargate service behind an ALB, enable Container Insights, and go from there.


Resources


Connect with me on X, Bluesky, LinkedIn, GitHub, Medium, Dev.to, or the AWS Community. Check out more of my projects at darryl-ruggles.cloud and join the Believe In Serverless community.

Share this article

Comments

Loading comments...