Ansible on EC2 without SSH: connecting via AWS SSM Session Manager

I really like using AWS SSM Session Manager for EC2 instances management whenever it’s possible, and recently faced a case where the requirement was to use Ansible for configuration management of EC2 instances, but without opening SSH access to them. That sounded like a good use case for SSM Session Manager, but I had to do some research to figure out how to make it work with Ansible.

I ended up using two approaches - one with a static inventory for a single instance, and another with dynamic discovery for a fleet of instances. Both are based on the community.aws.aws_ssm connection plugin, which uses SSM Session Manager under the hood to connect to the target instances without SSH. The main difference is how the inventory is built - either hardcoded with instance IDs or dynamically discovered via EC2 API queries.

Prerequisites

Whether static or dynamic, the Ansible control machine (your laptop, CI runner, etc.) needs:

AWS CLI installed and configured with permissions to call ssm:StartSession against the target instances (and ec2:DescribeInstances if you want to use dynamic discovery);
AWS Session Manager plugin for AWS CLI installed locally (session-manager-plugin --version to verify);
The target EC2 instance must have the SSM agent running and the AmazonSSMManagedInstanceCore managed policy attached to its instance profile;
An S3 bucket that the SSM agent on the instance can write to. This is the part that surprised me a bit - the community.aws.aws_ssm connection plugin uses an S3 bucket as a “transport” to copy files between the control node and the target instance, since SSM Session Manager itself doesn’t support file transfer natively. The bucket name is passed via ansible_aws_ssm_bucket_name, and the instance profile must allow s3:GetObject / s3:PutObject against it;
Ansible collections: community.aws (for the SSM connection plugin) and amazon.aws (for the dynamic EC2 lookup). Both are pulled from Ansible Galaxy via requirements.yml:

collections:
  - name: community.aws
  - name: amazon.aws

Python dependencies on the control node: boto3 and botocore (ideally, in a virtualenv. I prefer to wrap the playbook execution in a small shell script that bootstraps a virtualenv and installs these dependencies automatically, so the user doesn’t have to worry about it).

With that in place, the SSM connection plugin works, so let’s move on to the inventory.

Static inventory

The simplest case is a single, long-lived EC2 instance - for example, a dev instance used for DB connection tunneling, so there’s no autoscaling / replacement going on - the instance ID should be unchanging.

In this case, the inventory.yml is a plain YAML file with all the SSM connection details inlined:

all:
  hosts:
    ssm_host:
      ansible_host: <instance_id>
      ansible_connection: community.aws.aws_ssm
      ansible_aws_ssm_region: us-east-2 # SSM connection plugin needs the region to know which SSM endpoint to talk to
      ansible_aws_ssm_bucket_name: <bucket_name> # SSM connection plugin uses this bucket for file transfer, so it needs to be set
      become: true
      become_user: ec2-user # AL2023 default
      ansible_python_interpreter: /usr/bin/python3 # set explicitly to avoid Ansible's interpreter auto-discovery, which can be flaky over SSM

To run a playbook against this inventory:

ansible-playbook -i inventory.yml backup.yml

That’s it, connection should work. However, this approach doesn’t work for scalable / ephemeral setups, so let’s move on to dynamic discovery.

Dynamic inventory: discovering instances by tag

The initial static approach falls apart as soon as the instance is recreated frequently (e.g. via Terraform taint / replace, autoscaling, or simply rebuilds during environment provisioning, etc.). Hardcoding the instance ID means the inventory file goes stale every time something changes upstream, and there’s no good way to keep it in sync.

A better approach is to discover the targets at playbook runtime. The amazon.aws.ec2_instance_info module queries the EC2 API and returns the list of instances matching the filters you provide - and we can feed that list into Ansible’s in-memory inventory via add_host.

Example 1: a single named instance

For the same case, mentioned before (a single instance used for DB tunneling), we can tag the instance with a unique Name tag (e.g. <project>-<env>-db-tunnel) and query it dynamically in the playbook:

- name: Validate required variables
  hosts: localhost
  gather_facts: false
  tasks:
    - name: Fail if project is not provided
      fail:
        msg: "Required variable 'project' is not set. Pass it via -e 'project=<PROJECT>'"
      when: project is not defined

    - name: Fail if env is not provided or invalid
      fail:
        msg: "Required variable 'env' must be one of: dev, qa, test. Pass it via -e 'env=<ENV>'"
      when: env is not defined or env not in ['dev', 'qa', 'test']

- name: Fetch running EC2 instances dynamically
  hosts: localhost
  gather_facts: false
  vars:
    aws_region: us-east-2
    ec2_name_tag: "{{ project }}-{{ env }}-db-tunnel"
  tasks:
    - name: Query running db-tunnel EC2 instance by Name tag
      amazon.aws.ec2_instance_info:
        region: "{{ aws_region }}"
        filters:
          instance-state-name: running # Fetch only running instances to avoid trying to connect to stopped ones
          "tag:Name": "{{ ec2_name_tag }}"
      register: ec2_info

    - name: Extract target instance IDs
      set_fact:
        target_ids: "{{ ec2_info.instances | map(attribute='instance_id') | list }}"

    - name: Fail if no instances found
      fail:
        msg: "No running EC2 instances found for Name={{ ec2_name_tag }}"
      when: target_ids | length == 0

- name: Dynamically add hosts
  hosts: localhost
  gather_facts: false
  tasks:
    - name: Create inventory group from discovered instances
      add_host:
        name: "{{ item }}"
        groups: db_tunnel
        ansible_host: "{{ item }}"
        ansible_connection: community.aws.aws_ssm
        ansible_aws_ssm_region: us-east-2
        ansible_aws_ssm_bucket_name: <bucket_name>
        ansible_python_interpreter: /usr/bin/python3
      loop: "{{ target_ids }}"

- name: Bootstrap on db-tunnel instance
  hosts: db_tunnel
  become: true
  roles:
    - role: <example_role> # replace with actual role(s) to run on the target instance

So we’re getting the following flow:

Check and fail fast if the user forgot to set environment details;
Query instances via amazon.aws.ec2_instance_info to get the running instances with matching name. Filtering by running is important - trying to connect to a stopped instance fails in a non-obvious way;
Add each discovered instance ID to a db_tunnel group, with all the SSM connection parameters set (same as in the static example, just templated);
Run the actual configuration - hosts: db_tunnel targets the populated group, and the playbook will run against the discovered instances with SSM connection.

To invoke it:

ansible-playbook playbook.yml -e "project=<PROJECT> env=<ENV>"

So this would work the same way as the static inventory example, but allows the instance to be recreated / can be reused across multiple environments and projects without need to manually update the inventory file.

Example 2: multiple instances by `Environment` tag

If we want to target multiple instances in the same environment, we can use the Environment tag instead of the Name tag. I have a setup for a ML training environment where I provision 1-N GPU EC2 instances (all tagged Environment=training) and need to install the same software stack on all of them, regardless of how many there are at any given moment.

The playbook is a slight variation of the previous one - the filter is tag:Environment instead of tag:Name, and the result is a list rather than a single instance:

- name: Fetch running EC2 instances dynamically
  hosts: localhost
  gather_facts: false
  vars:
    aws_region: us-east-2
    ec2_environment_tag: training
  tasks:
    - name: Query running EC2 instances by Environment tag
      amazon.aws.ec2_instance_info:
        region: "{{ aws_region }}"
        filters:
          instance-state-name: running
          "tag:Environment": "{{ ec2_environment_tag }}"
      register: ec2_info

    - name: Extract target instance IDs
      set_fact:
        target_ids: "{{ ec2_info.instances | map(attribute='instance_id') | list }}"

    - name: Fail if no instances found
      fail:
        msg: "No running EC2 instances found for Environment={{ ec2_environment_tag }}"
      when: target_ids | length == 0

- name: Dynamically add hosts
  hosts: localhost
  gather_facts: false
  tasks:
    - name: Create inventory group from list
      add_host:
        name: "{{ item }}"
        groups: dynamic_targets
        ansible_host: "{{ item }}"
        ansible_connection: community.aws.aws_ssm
        ansible_aws_ssm_region: us-east-2
        ansible_aws_ssm_bucket_name: <bucket_name>
        ansible_python_interpreter: /usr/bin/python3
      loop: "{{ target_ids }}"

- name: Run tasks on dynamic hosts
  hosts: dynamic_targets
  become: true
  roles:
    - role: <example_role> # replace with actual role(s) to run on the target instances
    - role: geerlingguy.docker
      vars:
        docker_users:
          - ubuntu

If target_ids comes back with three instance IDs, add_host runs three times, dynamic_targets ends up with three members, and Ansible runs the role across all of them.

Summary

To make the playbook reproducible across machines (mine, my teammates’, CI), I usually wrap the whole thing in a small shell script that creates a Python virtualenv, installs ansible + boto3 + botocore, runs ansible-galaxy install -r requirements.yml, and then executes ansible-playbook with whatever args the caller passed. That way, the only thing the user needs is Python and AWS CLI with configured credentials - the rest is bootstrapped automatically. I won’t paste the full script here since it’s pretty mundane, but the idea is that for such simple cases, ./run-ansible.sh -e "key=value" should be the entire interface for everyone running the playbook. Combined with SSM, this is close to my ideal config-management UX which requires only a single command to run the playbook, without any manual inventory management or SSH key handling.

Prerequisites#

Static inventory#

Dynamic inventory: discovering instances by tag#

Example 1: a single named instance#

Example 2: multiple instances by Environment tag#

Summary#