I really like using AWS SSM Session Manager for EC2 instances management whenever it’s possible, and recently faced a case where the requirement was to use Ansible for configuration management of EC2 instances, but without opening SSH access to them. That sounded like a good use case for SSM Session Manager, but I had to do some research to figure out how to make it work with Ansible.
I ended up using two approaches - one with a static inventory for a single instance, and another with dynamic discovery for a fleet of instances. Both are based on the community.aws.aws_ssm connection plugin, which uses SSM Session Manager under the hood to connect to the target instances without SSH. The main difference is how the inventory is built - either hardcoded with instance IDs or dynamically discovered via EC2 API queries.
Prerequisites
Whether static or dynamic, the Ansible control machine (your laptop, CI runner, etc.) needs:
- AWS CLI installed and configured with permissions to call
ssm:StartSessionagainst the target instances (andec2:DescribeInstancesif you want to use dynamic discovery); - AWS Session Manager plugin for AWS CLI installed locally (
session-manager-plugin --versionto verify); - The target EC2 instance must have the SSM agent running and the
AmazonSSMManagedInstanceCoremanaged policy attached to its instance profile; - An S3 bucket that the SSM agent on the instance can write to. This is the part that surprised me a bit - the
community.aws.aws_ssmconnection plugin uses an S3 bucket as a “transport” to copy files between the control node and the target instance, since SSM Session Manager itself doesn’t support file transfer natively. The bucket name is passed viaansible_aws_ssm_bucket_name, and the instance profile must allows3:GetObject/s3:PutObjectagainst it; - Ansible collections:
community.aws(for the SSM connection plugin) andamazon.aws(for the dynamic EC2 lookup). Both are pulled from Ansible Galaxy viarequirements.yml:
collections:
- name: community.aws
- name: amazon.aws
- Python dependencies on the control node:
boto3andbotocore(ideally, in a virtualenv. I prefer to wrap the playbook execution in a small shell script that bootstraps a virtualenv and installs these dependencies automatically, so the user doesn’t have to worry about it).
With that in place, the SSM connection plugin works, so let’s move on to the inventory.
Static inventory
The simplest case is a single, long-lived EC2 instance - for example, a dev instance used for DB connection tunneling, so there’s no autoscaling / replacement going on - the instance ID should be unchanging.
In this case, the inventory.yml is a plain YAML file with all the SSM connection details inlined:
all:
hosts:
ssm_host:
ansible_host: <instance_id>
ansible_connection: community.aws.aws_ssm
ansible_aws_ssm_region: us-east-2 # SSM connection plugin needs the region to know which SSM endpoint to talk to
ansible_aws_ssm_bucket_name: <bucket_name> # SSM connection plugin uses this bucket for file transfer, so it needs to be set
become: true
become_user: ec2-user # AL2023 default
ansible_python_interpreter: /usr/bin/python3 # set explicitly to avoid Ansible's interpreter auto-discovery, which can be flaky over SSM
To run a playbook against this inventory:
ansible-playbook -i inventory.yml backup.yml
That’s it, connection should work. However, this approach doesn’t work for scalable / ephemeral setups, so let’s move on to dynamic discovery.
Dynamic inventory: discovering instances by tag
The initial static approach falls apart as soon as the instance is recreated frequently (e.g. via Terraform taint / replace, autoscaling, or simply rebuilds during environment provisioning, etc.). Hardcoding the instance ID means the inventory file goes stale every time something changes upstream, and there’s no good way to keep it in sync.
A better approach is to discover the targets at playbook runtime. The amazon.aws.ec2_instance_info module queries the EC2 API and returns the list of instances matching the filters you provide - and we can feed that list into Ansible’s in-memory inventory via add_host.
Example 1: a single named instance
For the same case, mentioned before (a single instance used for DB tunneling), we can tag the instance with a unique Name tag (e.g. <project>-<env>-db-tunnel) and query it dynamically in the playbook:
- name: Validate required variables
hosts: localhost
gather_facts: false
tasks:
- name: Fail if project is not provided
fail:
msg: "Required variable 'project' is not set. Pass it via -e 'project=<PROJECT>'"
when: project is not defined
- name: Fail if env is not provided or invalid
fail:
msg: "Required variable 'env' must be one of: dev, qa, test. Pass it via -e 'env=<ENV>'"
when: env is not defined or env not in ['dev', 'qa', 'test']
- name: Fetch running EC2 instances dynamically
hosts: localhost
gather_facts: false
vars:
aws_region: us-east-2
ec2_name_tag: "{{ project }}-{{ env }}-db-tunnel"
tasks:
- name: Query running db-tunnel EC2 instance by Name tag
amazon.aws.ec2_instance_info:
region: "{{ aws_region }}"
filters:
instance-state-name: running # Fetch only running instances to avoid trying to connect to stopped ones
"tag:Name": "{{ ec2_name_tag }}"
register: ec2_info
- name: Extract target instance IDs
set_fact:
target_ids: "{{ ec2_info.instances | map(attribute='instance_id') | list }}"
- name: Fail if no instances found
fail:
msg: "No running EC2 instances found for Name={{ ec2_name_tag }}"
when: target_ids | length == 0
- name: Dynamically add hosts
hosts: localhost
gather_facts: false
tasks:
- name: Create inventory group from discovered instances
add_host:
name: "{{ item }}"
groups: db_tunnel
ansible_host: "{{ item }}"
ansible_connection: community.aws.aws_ssm
ansible_aws_ssm_region: us-east-2
ansible_aws_ssm_bucket_name: <bucket_name>
ansible_python_interpreter: /usr/bin/python3
loop: "{{ target_ids }}"
- name: Bootstrap on db-tunnel instance
hosts: db_tunnel
become: true
roles:
- role: <example_role> # replace with actual role(s) to run on the target instance
So we’re getting the following flow:
- Check and fail fast if the user forgot to set environment details;
- Query instances via
amazon.aws.ec2_instance_infoto get therunninginstances with matching name. Filtering byrunningis important - trying to connect to astoppedinstance fails in a non-obvious way; - Add each discovered instance ID to a
db_tunnelgroup, with all the SSM connection parameters set (same as in the static example, just templated); - Run the actual configuration -
hosts: db_tunneltargets the populated group, and the playbook will run against the discovered instances with SSM connection.
To invoke it:
ansible-playbook playbook.yml -e "project=<PROJECT> env=<ENV>"
So this would work the same way as the static inventory example, but allows the instance to be recreated / can be reused across multiple environments and projects without need to manually update the inventory file.
Example 2: multiple instances by Environment tag
If we want to target multiple instances in the same environment, we can use the Environment tag instead of the Name tag. I have a setup for a ML training environment where I provision 1-N GPU EC2 instances (all tagged Environment=training) and need to install the same software stack on all of them, regardless of how many there are at any given moment.
The playbook is a slight variation of the previous one - the filter is tag:Environment instead of tag:Name, and the result is a list rather than a single instance:
- name: Fetch running EC2 instances dynamically
hosts: localhost
gather_facts: false
vars:
aws_region: us-east-2
ec2_environment_tag: training
tasks:
- name: Query running EC2 instances by Environment tag
amazon.aws.ec2_instance_info:
region: "{{ aws_region }}"
filters:
instance-state-name: running
"tag:Environment": "{{ ec2_environment_tag }}"
register: ec2_info
- name: Extract target instance IDs
set_fact:
target_ids: "{{ ec2_info.instances | map(attribute='instance_id') | list }}"
- name: Fail if no instances found
fail:
msg: "No running EC2 instances found for Environment={{ ec2_environment_tag }}"
when: target_ids | length == 0
- name: Dynamically add hosts
hosts: localhost
gather_facts: false
tasks:
- name: Create inventory group from list
add_host:
name: "{{ item }}"
groups: dynamic_targets
ansible_host: "{{ item }}"
ansible_connection: community.aws.aws_ssm
ansible_aws_ssm_region: us-east-2
ansible_aws_ssm_bucket_name: <bucket_name>
ansible_python_interpreter: /usr/bin/python3
loop: "{{ target_ids }}"
- name: Run tasks on dynamic hosts
hosts: dynamic_targets
become: true
roles:
- role: <example_role> # replace with actual role(s) to run on the target instances
- role: geerlingguy.docker
vars:
docker_users:
- ubuntu
If target_ids comes back with three instance IDs, add_host runs three times, dynamic_targets ends up with three members, and Ansible runs the role across all of them.
Summary
To make the playbook reproducible across machines (mine, my teammates’, CI), I usually wrap the whole thing in a small shell script that creates a Python virtualenv, installs ansible + boto3 + botocore, runs ansible-galaxy install -r requirements.yml, and then executes ansible-playbook with whatever args the caller passed. That way, the only thing the user needs is Python and AWS CLI with configured credentials - the rest is bootstrapped automatically. I won’t paste the full script here since it’s pretty mundane, but the idea is that for such simple cases, ./run-ansible.sh -e "key=value" should be the entire interface for everyone running the playbook. Combined with SSM, this is close to my ideal config-management UX which requires only a single command to run the playbook, without any manual inventory management or SSH key handling.
