• +43 660 1453541
  • contact@germaniumhq.com

Use Ansible Don't Clone VMs


Use Ansible Don’t Clone VMs

Recently I got to be an admin over a few hosts. These hosts were all CentOS-based and in terrible shape. Missing repos, different versions, weird software installed on them. It turns out administrators cloned the VMs, and as time passed, they updated some of them, leaving some behind.

Here’s where Ansible comes into play. Ansible is a configuration management tool that allows describing the desired state of one or multiple systems. That means you write some files with how you want your systems to look like and where they are, then fire up Ansible.

To achieve that, you need to do two things:

  1. Define an inventory. This is just a file where you describe your servers and connection details to them. A nice feature is that you can group for them.

  2. Define playbooks. Playbooks are, in a sense, a simple script that runs a bunch of steps (called modules in Ansible parlance) written in yaml.

Here’s how an inventory looks like:

localhost ansible_connection=local ansible_python_interpreter=/usr/bin/python3

[production]
germanium ansible_host=germaniumhq.com ansible_ssh_user=raptor ansible_ssh_port=22 ansible_ssh_private_key_file=/home/raptor/.ssh/id_rsa ansible_python_interpreter=/usr/bin/python3

[infra]
hyperion ansible_host=192.168.0.115 ansible_ssh_user=raptor ansible_sudo_pass=...... ansible_python_interpreter=/usr/bin/python3

And here’s a simple playbook:

- name: Update the packages on the system.
  hosts: production
  gather_facts: false
  become: True
  tasks:
    - name: Update packages.
      apt: update_cache=yes upgrade=dist

    - name: Ensure mysql is started.
      docker_container:
        name: mysql
        state: started

    - name: Ensure nginx is started
      docker_container:
        name: nginx
        state: started

You can do far more fantastic stuff, such as reuse playbook sections using a concept of shared libraries (called roles in Ansible lingo), and my favorite: reusing libraries written by more competent people using ansible-galaxy.

Whenever you want to run the playbook, you’d run:

ansible-playbook -i path/to/inventory/file playbook-file.yml

Ansible will connect to all the hosts configured in the playbook via SSH. Ansible then connects via SSH to the hosts you specified and starts configuring them. All this is controlled from your laptop, no need to have running agents on them and other nonsense.

So why all this long intro? Because Ansible has the nice feature of describing how the system should look like, what should run on them, and how we got from a blank, new state, OS, to the actual running instance.

Cloning VMs? Not so much.

Whenever you clone an instance, you clone all the installed programs without knowing what was installed. Did the admin download some files for a test? That gets copied everywhere as well now. Some crazy cron job? That runs as well on the new host.

As time advances, instances start diverging more and more. There is no traceability on what’s going on. In the end, you’re left with a piling mess of servers. Only the original admin and God knew what was running on them. Without the original admin, only God knows what’s there.

Conclusion

Ansible jobs are a living document on what’s on those servers. Do you need to install new software on all the nodes? Change a playbook, and rerun it. All hosts are updated. New hosts? Add them to the inventory, and rerun it. The old hosts are still matching the playbooks, and the new ones get the playbooks applied against them. Make sure you commit the playbooks + inventory (without credentials, of course) in git, and you’ve just turbo-boosted your Ops.