I’m continuing my journey toward a fully immutable home lab. I’ve lived with the changes that I have been experimenting around with for about a year now and so far its been great. Definitely a couple of hiccups along the way but way less time troubleshooting specific issues that occasionally crop up from time to time and more time enjoying the joys of automation at home.

This post will go through what Terraform is and using Terraform to manage Proxmox VMs automatically.

Prerequisites

Before we get started there are a few things we will need to kick off this journey. Each blog post will build upon the last so if there is a tool mentioned that you don’t recognize, check out the #ImmutableInfrastructureJourney tag on this blog.

Since the last post I’ve upgraded Proxmox to version 8.0.3 so everything mentioned here should work with that version. You can find instructions to install Proxmox at Proxmox’s getting started page.

You will also need to download Terraform from HashiCorp’s Terraform download page.

For posterity, everything that I am doing will be done within Windows WSL but should work on any system supported by Terraform. The Proxmox node I’ll be working on is installed on an Intel NUC with 32GB of memory although can be done on a smaller or larger machine as well.

Creating a new VM with Terraform

Now, why would one want to do this in the first place? Virtual machine templates are already doing a ton of heavy lifting to no longer need to run through the ubuntu install from scratch. You can even install all of the packages, configuration, and services to your hearts content with one of these golden images and use that, why do you want to manage this with yet another tool? I believe the power with Terraform doesn’t truly show until you start managing multiple components of a home lab in concert with each other. For example, we’ve all needed to set up a DNS record for a newly deployed service or create a credential that we then have to give to a config deep within a VM to ensure that it spins up correctly. Even a task such as opening a port on a home router is often tied to some infrastructure in a home lab. Terraform can manage all of these resources together and more in a simple, infrastructure as code way. All while building on the things we have already learned in setting up a cloud ready image from the last post.

Once you have a cloud ready image, either by downloading an existing image from Ubuntu’s site or building one from scratch using the tutorial in the previous post, you are ready to start composing your entire home lab as code instead of the resources on a single machine with Terraform.

Start a Terraform project

Our first step is to get a Terraform project setup with the providers we will need.

What is a Terraform Provider?

Terraform itself can manage very little by itself. You can pretty much only use the built-in functions and nothing more. In order to start using Terraform to manage things you must pull in Terraform providers which let Terraform know how to interact with various services APIs.

In a new directory, create a file called main.tf and lets pull in the Proxmox provider, telmate/proxmox, by defining it as a required provider.

terraform {
  required_providers {
    proxmox = {
      source = "telmate/proxmox"
      version = "2.9.14"
    }
  }
}

From here we need to configure the provider to let it know how to connect to our Proxmox server. Terraform providers can usually be configured in one of two ways. Either within HCL, the configuration language that Terraform uses, in a provider block or within environment variables which providers will look for when you are running Terraform.

I’ll show the HCL with the environment variables as comments aside. I highly recommend storing these in environment variables as much of these configurations are considered secret and should never be committed to a repo although I recognize that getting something working quick is higher priority than making it work securely.

Below your terraform block within the main.tf file we created, we will add in the following provider block.

provider "proxmox" {
  pm_api_url      = "https://proxmox-server:8006/api2/json" // $PM_API_URL
  pm_user         = "terraform@pve"                         // $PM_USER
  pm_password     = "changeMe!123"                          // $PM_USER

  pm_tls_insecure = "true"  // Required for self-signed certificates
}

If you don’t have a user dedicated to managing Proxmox with Terraform, the docs on the Proxmox provider list instructions on how to generate them with the correct permissions for managing Proxmox.

Run a simple terraform init which will tell Terraform to download all of the required providers from the Terraform Registry and that is all we need to do in order to setup Terraform.

$ terraform init
Initializing the backend...

Initializing provider plugins...
- Finding telmate/proxmox versions matching "2.9.11"...
- Installing telmate/proxmox v2.9.11...
- Installed telmate/proxmox v2.9.11 (self-signed, key ID A9EBBE091B35AFCE)

Partner and community providers are signed by their developers.
If you'd like to know more about provider signing, you can read about it here:
https://www.terraform.io/docs/cli/plugins/signing.html

Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

Create a VM Resource with Terraform

Now that we have configured terraform to talk to Proxmox, lets start managing resources!

What is a Terraform Resource?

In Terraform, any single unit of infrastructure that can be managed is a resource. Resources can be created, updated, or deleted depending on what you change within the resources parameters. These resources are defined in HCL as resource blocks with the type that the resource is suppose to be and a unique name identifying the resource. Within the block are the parameters of the resource. Terraform providers let Terraform know which parameters of a resource are safe to update and which parameters require a resource to be destroyed and created from scratch.

For example, a Proxmox VM has a description attached to it. This description is more a human helper for us to help identify a VM’s function and has no real impact to what is contained on the VM. Updating this parameter does not cause the resource to be “tainted” or marked for deletion and recreation. Other parameters, such as updating the base image cannot be done cleanly without “tainting” the resource. In this case the resource would be marked as tainted and cleaned up when running terraform.

Lets create a file specifically for a new Proxmox virtual machine using a cloud ready image. I’ll name this new file proxmox.tf and add a resource for a new VM named “lab”.

resource "proxmox_vm_qemu" "lab" {
}

In this stanza we need to tell Proxmox what image to clone when creating this VM. My cloud ready image has the name ubuntu-ci-base so we will use that as our clone name. We also need to let the provider know the name of the proxmox node we would like to spin this VM up on. This is useful for setups where you have a clustered Proxmox setup. Even if you don’t have a Proxmox cluster, the API still needs to know.

resource "proxmox_vm_qemu" "lab" {
  clone       = "ubuntu-ci-base"
  target_node = "nuc"
}

And now we can give this VM a name, 4 vCPUs, and 2GB of memory.

resource "proxmox_vm_qemu" "lab" {
  clone       = "ubuntu-ci-base"
  target_node = "nuc"

  name    = "node-1"
  sockets = 1
  cores   = 4
  memory  = 2048
}

Next lets add in 100GB of disk for our machine to use. We also need to attach a network interface so the VM can reach the internet.

resource "proxmox_vm_qemu" "lab" {
  clone       = "ubuntu-ci-base"
  target_node = "nuc"

  name    = "node-1"
  sockets = 1
  cores   = 4
  memory  = 2048

  disk {
    type    = "scsi"
    storage = "local-lvm"
    size    = "100G"
  }

  network {
    bridge = "vmbr0"
    model  = "virtio"
  }
}

And there we have it, a basic VM that we can fully manage via Terraform! Now we run terraform plan to make sure that everything looks correct.

$ terraform plan
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # proxmox_vm_qemu.lab will be created
  + resource "proxmox_vm_qemu" "lab" {
    [...]
  }

  Plan: 1 to add, 0 to change, 0 to destroy.

Perfect! Looks like all of the resources we defined are accounted for in this plan. Lets go ahead and apply it with terraform apply.

$ terraform apply

[... Terraform plan output ...]

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

proxmox_vm_qemu.lab: Creating...
proxmox_vm_qemu.lab: Still creating... [10s elapsed]
[...]
proxmox_vm_qemu.lab: Still creating... [1m40s elapsed]
proxmox_vm_qemu.lab: Creation complete after 1m41s [id=nuc/qemu/110]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Now if we log into Proxmox we can see that Terraform created a VM and started it, all entirely defined within the code that we just wrote. If you try and modify anything within the stanza notice that depending on what is modified, Terraform will either update the VM in place or destroy and recreate the VM.

Updating things like CPU, memory, disk, and even network can be updated by the provider without first destroying it. The provider is smart enough to know that it first needs to stop the VM, modify the hardware settings of the VM, and then start the VM back up. However when updating the base image, there is no way to cleanly migrate from one base image to another base image so Terraform instead deletes the old resource and recreates a new one in its place.

Terraform State

So how the heck does Terraform keep track of all of this stuff? Enter Terraform state files, found in the terraform.tfstate file that was created after running terraform apply. This state file is what Terraform thinks is running out in the infrastructure it is managing. It contains things like what specific vm instance maps to what resource in your code and all the features that are associated with the resources.

This file provides Terraform a way to know if someone or something modified the VM outside of Terraform and the next time you run Terraform, it will notice the change and revert it back to what is defined within your code. This is the beauty of infrastructure as code as it forces people to codify changes within code that can be peer reviewed by others.

To check in or not to check in terraform.tfstate

It may be tempting to check in the terraform.tfstate file into git. I would advise against this for a few reasons. First, this file contains the state of every resource that Terraform is managing, not just the properties you define on a resource. Some of these resources may have private values which would be bad to leak. Second, storing this file within git implies a workflow where multiple users may want to make changes on their own branches. The state file is suppose to reflect the current desired state of your infrastructure so if person on feature branch branch-a runs terraform apply from there branch and then later someone on the main branch runs terraform apply, there will be changes that have never been seen in main which could cause mayhem as Terraform attempts to remedy the resources it thinks has been tainted. And lastly, even if you were to keep everyone from using Terraform in their own branches, Terraform does not have a locking mechanism with the default state backend that would prevent multiple instances of Terraform from running at the same time. Potentially causing duplicate and conflicting state files being generated, both being accurate and true but talking about different infrastructure.

If in doubt, store the terraform.tfstate file in a single location centralized location, preferable using a state backend that has locking like S3 or Terraform Cloud. This ensures that one and only ever one instance of Terraform may be operating on the state file at any given time.

Clean up

Lets say you are finished with the VM that you created with Terraform. Instead of deleting all of the code you wrote and running terraform apply again, simply tell Terraform to destroy all the defined resources with terraform destroy.

proxmox_vm_qemu.lab: Refreshing state... [id=nuc/qemu/110]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  - destroy

Terraform will perform the following actions:

  # proxmox_vm_qemu.lab will be destroyed
  - resource "proxmox_vm_qemu" "lab" {
  [...]
  }

Plan: 0 to add, 0 to change, 1 to destroy.

Do you really want to destroy all resources?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes

proxmox_vm_qemu.lab: Destroying... [id=nuc/qemu/110]
proxmox_vm_qemu.lab: Destruction complete after 4s

Destroy complete! Resources: 1 destroyed.

And that’s a crash course on using Terraform to start to manage VMs in Proxmox.

In the next post we will talk about how to add in our own customizations to an instance that Terraform manages by using the full power of seeding our cloud ready images with configs using cloud-init.