用Terraform管理集群编译环境-2

前面已经用terraform可以批量创建出基础环境,但真正要做到集群部署这个环节还是需要有一定的活需要做的。所以后续我将terraform和自己改编的rong揉在了一起。通过预编译好的qcow2镜像,可以快速启动任意个kubernetes节点的集群。

前置条件

qcow2预编译镜像中需安装cloud-init, qemu-guest-agent两个包。安装完毕后需手动使能cloud-init,后续我们在terraform创建虚拟机实例的时候可以通过cloud-init注入一些信息。

# systemctl enable cloud-init

debian 9.0上需要安装mkisofs, 因为mkisofs已被genisoimage代替,因而需执行以下操作:

# apt-get install -y genisoimage
# ln -s /usr/bin/genisoimage /usr/bin/mkisofs

terraform需要具备以下插件, 其中terraform-provider-libvirt在debian 9.0上需手动编译:

# ls ~/.terraform.d/plugins/
terraform-provider-ansible  terraform-provider-libvirt  terraform-provider-template_v2.1.2_x4

cloud-init文件

cloud-init.cfg文件内容如下:

#cloud-config
# https://cloudinit.readthedocs.io/en/latest/topics/modules.html
hostname: ${HOSTNAME}
users:
  - name: xxxxx
    sudo: ALL=(ALL) NOPASSWD:ALL
    groups: users, admin
    home: /home/xxxxx
    shell: /bin/bash
    ssh-authorized-keys:
      - ssh-rsa xxxxxxxxxxxxxxxxxxxx
ssh_pwauth: True
disable_root: false
chpasswd:
  list: |
     xxxxx:linux
  expire: False

真正用到的只有hostname: ${HOSTNAME}这个变量,其他的步骤是用于创建一个名为xxxxx的用户并更改其密码。后续需要对操作系统进行深度定制的时候可以使用该操作。

main.tf定义

main.tf是整个底层架构编排的核心文件,内容如下:

bash {linenos=table,linenostart=1}
################################################################################
#  vars definition
################################################################################
variable "VM_COUNT" {
  default = 10
  type = number
}

variable "VM_USER" {
  default = "developer"
  type = string
}

variable "VM_HOSTNAME" {
  default = "newnode"
  type = string
}

variable "VM_IMG_URL" {
  default = "http://1xx.xxx.xxx.xxx/xxxx180403cloudinit.img"
  type = string
}

variable "VM_IMG_FORMAT" {
  default = "qcow2"
  type = string
}

# https://www.ipaddressguide.com/cidr
variable "VM_CIDR_RANGE" {
  default = "10.10.10.0/24"
  type = string
}

variable "LIBVIRT_POOL_DIR" {
  default = "./.local/.docker-libvirt"
  type = string
}

#variable libvirt_host {
#  type = string
#  description = "IP address of host running libvirt"
#}
#
#variable instance_name {
#  type = string
#  description = "name of VM instance"
#}

variable pool_name {
  type = string
  default = "default"
  description = "name of pool to store disk and iso image"
}

#variable source_path {
#  type = string
#  description = "path to qcow2 base image, can be remote url or local disk path"
#}

variable disk_format {
  type = string
  default = "qcow2"
}

variable default_password {
  type = string
  default = "passw0rd"
  description = "default password to login to VM when running, it's recommended to disable this manually"
}

variable memory_size {
  type = string
  default = "5120"
  description = "memory size of VM"
}

variable num_cpu {
  default = 2
  description = "number of vCPU which VM has"
}

variable num_network_interface {
  default = 1
  description = "number of network interfaces which VM has"
}

variable private_network_bridge {
  type = string
  default = "virbr0"
  description = "existing network bridge on host that VM needs to connect to private network"
}

variable public_network_bridge {
  type = string
  default = "virbr1"
  description = "existing network bridge on host that VM needs to connect to public network"
}

#variable user_data {
#  type = string
#}

variable autostart {
  default = "true"
  type = string
}

################################################################################
# PROVIDERS
################################################################################

# instance the provider
provider "libvirt" {
  uri = "qemu:///system"
}

# If you want to call remote libvirt provider. 
#provider "libvirt" {
#  uri = "qemu+tcp://${var.libvirt_host}/system"
#}

################################################################################
# DATA TEMPLATES
################################################################################

# https://www.terraform.io/docs/providers/template/d/file.html

# https://www.terraform.io/docs/providers/template/d/cloudinit_config.html
data "template_file" "user_data" {
  count = var.VM_COUNT
  template = file("${path.module}/cloud_init.cfg")
  vars = {
    HOSTNAME = "${var.VM_HOSTNAME}-${count.index + 1}"
  }
}

#data "template_file" "network_config" {
#  template = file("${path.module}/network_config.cfg")
#}


################################################################################
# ANSIBLE ITEMS
################################################################################
resource "ansible_group" "kube-deploy" {
  inventory_group_name = "kube-deploy"
}

resource "ansible_group" "kube-master" {
  inventory_group_name = "kube-master"
}

resource "ansible_group" "kube-node" {
  inventory_group_name = "kube-node"
}

resource "ansible_group" "etcd" {
  inventory_group_name = "etcd"
}

resource "ansible_group" "k8s-cluster" {
  inventory_group_name = "k8s-cluster"
  children = ["kube-master", "kube-node"]
}

# if count > 3, then we have 3 ectds, 3 kube-master, count kube-nodes

# The first node should be kube-deploy/kube-master/kube-node/etcd. 
resource "ansible_host" "deploynode" {
    groups = ["kube-master", "etcd", "kube-node", "kube-deploy"]
    inventory_hostname = "${var.VM_HOSTNAME}-1"
    vars = {
        ansible_user = "root"
        ansible_ssh_private_key_file = "./deploy.key"
        ansible_host = element(libvirt_domain.vm.*.network_interface.0.addresses.0, 0)
        ip = element(libvirt_domain.vm.*.network_interface.0.addresses.0, 0)
    }
    #provisioner "local-exec" {
    #  command = "sleep 40 && ansible-playbook -i  /etc/ansible/terraform.py cluster.yml --extra-vars @rong-vars.yml"
    #}
}

# Create 2(kube-master, etcd, kube-node) nodes, node2, node3
resource "ansible_host" "master" {
    count = var.VM_COUNT >= 3 ? 2 : var.VM_COUNT -1
    groups = var.VM_COUNT >= 3 ? ["kube-master", "etcd", "kube-node"] : ["kube-master", "kube-node"]
    #inventory_hostname = format("%s-%d", "node", count.index + 2)
    inventory_hostname = format("%s-%d", var.VM_HOSTNAME, count.index + 2)
    vars = {
        ansible_user = "root"
        ansible_ssh_private_key_file = "./deploy.key"
        ansible_host = element(libvirt_domain.vm.*.network_interface.0.addresses.0, count.index+1)
        ip = element(libvirt_domain.vm.*.network_interface.0.addresses.0, count.index+1)
    }
}

# others should be kube-nodes
resource "ansible_host" "worker" {
    count = var.VM_COUNT > 3 ? var.VM_COUNT - 3 : 0
    groups = ["kube-node"]
    #inventory_hostname = "node${count.index + 4}"
    #inventory_hostname = format("%s-%d", "node", count.index + 4)
    inventory_hostname = format("%s-%d", var.VM_HOSTNAME, count.index + 4)
    vars = {
        ansible_user = "root"
        ansible_ssh_private_key_file = "./deploy.key"
        ansible_host = element(libvirt_domain.vm.*.network_interface.0.addresses.0, count.index+3)
        ip = element(libvirt_domain.vm.*.network_interface.0.addresses.0, count.index+3)
    }
}

################################################################################
# RESOURCES
################################################################################
resource "libvirt_pool" "vm" {
  name = "${var.VM_HOSTNAME}_pool"
  type = "dir"
  path = abspath("${var.LIBVIRT_POOL_DIR}")
}

# We fetch the disk image for the operating system from the given url. For the base image. 
resource "libvirt_volume" "vm_disk_image" {
  name   = "${var.VM_HOSTNAME}_disk_image.${var.VM_IMG_FORMAT}"
  # Or you could specify like `pool = "transfer"`
  pool   = libvirt_pool.vm.name
  source = var.VM_IMG_URL
  format = var.VM_IMG_FORMAT
}

// It will use the disk image fetched at `libirt_volume.vm_disk_image` as the
//  base one to build the worker VM.
resource "libvirt_volume" "vm_worker" {
  count  = var.VM_COUNT
  name   = "worker_${var.VM_HOSTNAME}-${count.index + 1}.${var.VM_IMG_FORMAT}"
  base_volume_id = libvirt_volume.vm_disk_image.id
  pool   = libvirt_volume.vm_disk_image.pool
}

#*# Create a public network for the VMs
#*# https://www.ipaddressguide.com/cidrv
#*resource "libvirt_network" "vm_public_network" {
#*   name = "${var.VM_HOSTNAME}_network"
#*   autostart = true
#*   mode = "nat"
#*   domain = "${var.VM_HOSTNAME}.local"
#*
#*   # TODO: FIX CIDR ADDRESSES RANGE?
#*   # With `wait_for_lease` enabled, we get an error in the end of the VMs
#*   #  creation:
#*   #   - 'Requested operation is not valid: the address family of a host entry IP must match the address family of the dhcp element's parent'
#*   # But the VMs will be running and accessible via ssh.
#*   addresses = ["${var.VM_CIDR_RANGE}"]
#*
#*   dhcp {
#*    enabled = true
#*   }
#*   dns {
#*    enabled = true
#*   }
#*}

# for more info about paramater check this out 
# https://github.com/dmacvicar/terraform-provider-libvirt/blob/master/website/docs/r/cloudinit.html.markdown
# Use CloudInit to add our ssh-key to the instance
# you can add also meta_data field
resource "libvirt_cloudinit_disk" "cloudinit" {
  count = var.VM_COUNT
  name           = "${var.VM_HOSTNAME}-${count.index + 1}_cloudinit.iso"
  #user_data      = data.template_file.user_data.rendered 
  user_data      = data.template_file.user_data[count.index].rendered
  pool           = libvirt_pool.vm.name
}



resource "libvirt_domain" "vm" {
  count  = var.VM_COUNT
  name   = "${var.VM_HOSTNAME}-${count.index + 1}"
  #memory      = "${var.memory_size}"
  memory      = var.memory_size
  #vcpu        = "${var.num_cpu}"
  vcpu        = var.num_cpu
  #autostart   = "${var.autostart}"
  autostart   = var.autostart

  # TODO: FIX qemu-ga?
  # qemu-ga needs to be installed and working inside the VM, and currently is
  #  not working. Maybe it needs some configuration.
  qemu_agent = true
  #cloudinit = "${libvirt_cloudinit_disk.cloudinit.id}"
  cloudinit = element(libvirt_cloudinit_disk.cloudinit.*.id, count.index)


  # attach network interface to default network(192.168.122.0/24)
  # Or we could specify a new networking created in resource and attached to it. 
  network_interface {
    network_name   = "default"
    hostname   = "${var.VM_HOSTNAME}-${count.index + 1}"
    wait_for_lease = true
  }

  #* Attached to our created network.
  #*network_interface {
  #*  #hostname = "${var.VM_HOSTNAME}-${count.index + 1}"
  #*  network_id = "${libvirt_network.vm_public_network.id}"
  #*  #network_name = "${libvirt_network.vm_public_network.name}"

  #*  #addresses = ["${cidrhost(libvirt_network.vm_public_network.addresses, count.index + 1)}"]
  #*  addresses = ["${cidrhost(var.VM_CIDR_RANGE, count.index + 1)}"]

  #*  # TODO: Fix wait for lease?
  #*  # qemu-ga must be running inside the VM. See notes above in `qemu_agent`.
  #*  wait_for_lease = true
  #*}

  graphics {
    type = "vnc"
    listen_type = "address"
    autoport = true
  }

  # IMPORTANT
  # Ubuntu can hang is a isa-serial is not present at boot time.
  # If you find your CPU 100% and never is available this is why.
  #
  # This is a known bug on cloud images, since they expect a console
  # we need to pass it:
  # https://bugs.launchpad.net/cloud-images/+bug/1573095
  console {
    type        = "pty"
    target_port = "0"
    target_type = "serial"
  }

  console {
    type        = "pty"
    target_type = "virtio"
    target_port = "1"
  }

  disk {
    volume_id = element(libvirt_volume.vm_worker.*.id, count.index)
  }

}
################################################################################
# TERRAFORM CONFIG
################################################################################

terraform {
  required_version = ">= 0.12"
}

################################################################################
# TERRAFORM OUTPUT
################################################################################
#
output "ip" {
  value = "${libvirt_domain.vm.*.network_interface.0.addresses.0}"
}

local exec command , added to:

    provisioner "local-exec" {
      command = "sleep 40 && ansible-playbook -i  /etc/ansible/terraform.py cluster.yml --extra-vars @rong-vars.yml"
    }

逐行解释如下:

Buildingterraform-provider-libvirt

Build terraform-provider-libvirt for debian9.0.

Steps:

Get system info:

root@debian:~# cat /etc/issue
Debian GNU/Linux 9 \n \l

root@debian:~# cat /etc/debian_version 
9.0

wget the terraform and mv it to /usr/bin, then start building the plugin:

# vim /etc/apt/sources.list
deb http://mirrors.163.com/debian/ stretch main non-free contrib
deb http://mirrors.163.com/debian/ stretch-updates main non-free contrib
deb http://mirrors.163.com/debian/ stretch-backports main non-free contrib
deb http://mirrors.163.com/debian-security/ stretch/updates main non-free contrib
# apt-get update -y
# apt-get install libvirt-dev git build-essential golang=2:1.11~1~bpo9+1 golang=2:1.11~1~bpo9+1 golang-doc=2:1.11~1~bpo9+1 golang-go=2:1.11~1~bpo9+1 golang-src=2:1.11~1~bpo9+1
# mkdir /root/go
# vim /root/.bashrc
export GOPATH=/root/go
export PATH=$PATH:$GOPATH/bin
# export CGO_ENABLED="1"
# mkdir -p $GOPATH/src/github.com/dmacvicar; cd $GOPATH/src/github.com/dmacvicar
# git clone https://github.com/dmacvicar/terraform-provider-libvirt.git
# cd $GOPATH/src/github.com/dmacvicar/terraform-provider-libvirt
# make install

After building, go to /root/go/bin and examine the built plugin:

root@debian:~/go/bin# ./terraform-provider-libvirt --version
./terraform-provider-libvirt e9ff32f1ec5825dcf05481cb7ef6a3b645696a4f-dirty
Compiled against library: libvirt 3.0.0
Using library: libvirt 3.0.0

Now you got plugin compiled and ready to use on debian 9.0

用Terraform管理集群编译环境

环境

操作系统Ubuntu18.04.3
libvirtd (libvirt) 4.0.0

迅速搭建

terraform下载并加入到系统目录:

$ wget https://releases.hashicorp.com/terraform/0.12.17/terraform_0.12.17_linux_amd64.zip
$  unzip terraform_0.12.17_linux_amd64.zip
$ sudo mv terraform /usr/bin
$ terraform version
Terraform v0.12.17

terraform-provider-libvirt下载并完成初始化(https://github.com/dmacvicar/terraform-provider-libvirt/releases):

$ wget https://github.com/dmacvicar/terraform-provider-libvirt/releases/download/v0.6.0/terraform-provider-libvirt-0.6.0+git.1569597268.1c8597df.Ubuntu_18.04.amd64.tar.gz
$ tar xzvf terraform-provider-libvirt-0.6.0+git.1569597268.1c8597df.Ubuntu_18.04.amd64.tar.gz
$  terraform init
Terraform initialized in an empty directory!

The directory has no Terraform configuration files. You may begin working
with Terraform immediately by creating Terraform configuration files.
$ cd ~/.terraform.d
$ mkdir plugins
$ cp terraform-provider-libvirt plugins/

创建第一个环境

创建工作目录:

$ mkdir ~/projects/terraform
$ cd ~/projects/terraform

创建一个名为libvirt.tf的定义文件,定义在kvm上需要创建的虚拟机:

provider "libvirt" {
  uri = "qemu:///system"
}

resource "libvirt_volume" "node1-qcow2" {
  name = "node1-qcow2"
  pool = "default"
  source = "/media/sda/rong_ubuntu_180403.qcow2"
  format = "qcow2"
}

# Define KVM domain to create
resource "libvirt_domain" "node1" {
  name   = "node1"
  memory = "10240"
  vcpu   = 2

  network_interface {
    network_name = "default"
  }

  disk {
    volume_id = libvirt_volume.node1-qcow2.id
  }

  console {
    type = "pty"
    target_type = "serial"
    target_port = "0"
  }

  graphics {
    type = "spice"
    listen_type = "address"
    autoport = true
  }
}

初始化一个terraform工作目录, 而后生成并展示terraform执行计划,而后创建定义好的底层设施:

$ terraform init
$ terraform plan
$ terraform apply

销毁:

$ terraform destroy 

在apply和destroy时需要回答yes,如果需要跳过确认流程,则使用以下命令:

$ terraform apply -auto-approve
$ terraform destroy -auto-approve

cloud-init

这个可以参考example下ubuntu的例子。

multiple vms

参考样例如下:

provider "libvirt" {
  uri = "qemu:///system"
}

variable "hosts" {
  default = 2
}

variable "hostname_format" {
  type    = string
  default = "node%02d"
}

resource "libvirt_volume" "node-disk" {
  name             = "node-${format(var.hostname_format, count.index + 1)}.qcow2"
  count            = var.hosts
  base_volume_name = "xxxxx180403_vagrant_box_image_0.img"
  pool             = "default"
  format           = "qcow2"
}

resource "libvirt_domain" "node" {
  count  = var.hosts
  name   = format(var.hostname_format, count.index + 1)
  vcpu   = 1
  memory = 2048

  disk {
    volume_id = element(libvirt_volume.node-disk.*.id, count.index)
  }

  network_interface {
    network_name   = "default"
    mac            = "52:54:00:00:00:a${count.index + 1}"
    wait_for_lease = true
  }

  graphics {
    type = "spice"
    listen_type = "address"
    autoport = true
  }
}

terraform {
  required_version = ">= 0.12"
}

值得注意的是,该定义文件中使用了dhcp地址绑定,为此我们需要定义如下的dhcp规则:

$ sudo virsh net-dumpxml --network default
<network>
  <name>default</name>
  <uuid>c71715ac-90b5-483a-bb1c-6a40a5af1b56</uuid>
  <forward mode='nat'>
    <nat>
      <port start='1024' end='65535'/>
    </nat>
  </forward>
  <bridge name='virbr0' stp='on' delay='0'/>
  <mac address='52:54:00:92:5c:47'/>
  <ip address='192.168.122.1' netmask='255.255.255.0'>
    <dhcp>
      <range start='192.168.122.2' end='192.168.122.254'/>
      <host mac='52:54:00:00:00:a1' name='node01' ip='192.168.122.171'/>
      <host mac='52:54:00:00:00:a2' name='node02' ip='192.168.122.172'/>
      <host mac='52:54:00:00:00:a3' name='node03' ip='192.168.122.173'/>
      <host mac='52:54:00:00:00:a4' name='node04' ip='192.168.122.174'/>
      <host mac='52:54:00:00:00:a5' name='node05' ip='192.168.122.175'/>
      <host mac='52:54:00:00:00:a6' name='node06' ip='192.168.122.176'/>
      <host mac='52:54:00:00:00:a7' name='node07' ip='192.168.122.177'/>
      <host mac='52:54:00:00:00:a8' name='node08' ip='192.168.122.178'/>
      <host mac='52:54:00:00:00:a9' name='node09' ip='192.168.122.179'/>
      <host mac='52:54:00:00:00:aa' name='node10' ip='192.168.122.180'/>
    </dhcp>
  </ip>
</network>

重新定义的规则如下:

$ sudo virsh net-dumpxml --network default>default.xml
修改
$ sudo virsh net-define ./default.xml
重新检查规则
$ sudo virsh net-dumpxml --network default

定义完毕以后,则我们在tf文件中定义的虚拟机会通过DHCP从default网络得到相应的IP地址,有利于后续的集群部署。

检查IP是否被分配的命令:

$ sudo virsh net-dhcp-leases default
 Expiry Time          MAC address        Protocol  IP address                Hostname        Client ID or DUID
-------------------------------------------------------------------------------------------------------------------
 2019-12-03 15:53:49  52:54:00:00:00:a1  ipv4      192.168.122.171/24        node01          01:52:54:00:00:00:a1
 2019-12-03 15:53:49  52:54:00:00:00:a2  ipv4      192.168.122.172/24        node02          01:52:54:00:00:00:a2

定义完该网络后,需要手动重启该网络才可以使得更改生效。

CreateVagrantBoxFromQCOW2

Machine Preparation

Create a libvirt machine, install system.

Add user vagrant:

# adduser vagrant
# visudo -f /etc/sudoers.d/vagrant
vagrant ALL=(ALL) NOPASSWD:ALL
# visudo
vagrant ALL=(ALL) NOPASSWD:ALL
Defaults:vagrant	!requiretty
# mkdir -p /home/vagrant/.ssh
# chmod 0700 /home/vagrant/.ssh
# wget --no-check-certificate \
https://raw.github.com/mitchellh/vagrant/master/keys/vagrant.pub \
-O /home/vagrant/.ssh/authorized_keys
# chmod 0600 /home/vagrant/.ssh/authorized_keys
# chown -R vagrant /home/vagrant/.ssh
# vim /home/vagrant/.profile
add
[ -z "$BASH_VERSION" ] && exec /bin/bash -l
# chsh -s /bin/bash vagrant

Change the ethernet card from ens* to eth0:

# vim /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quite"
GRUB_CMDLINE_LINUX="net.ifnames=0 biosdevname=0"
# grub-mkconfig -o /boot/grub/grub.cfg

Change the netplan rules:

# vim /etc/netplan/01-netcfg.yaml 
# This file describes the network interfaces available on your system
# For more information, see netplan(5).
network:
  version: 2
  ethernets:
    eth0:
      dhcp4: yes
      dhcp-identifier: mac

Finally change the sshd configuration:

# vim /etc/ssh/sshd_config 
AuthorizedKeysFile .ssh/authorized_keys

For 20.04, you have to manually install ifupdown:

# apt-get install -y ifupdown

Now shutdown the machine, continue for packaging.

Packaging

Shrinking the qcow2 file:

# qemu-img convert -c -O qcow2 test180403.qcow2 test180403shrunk.qcow2
# mv test180403shrunk.qcow2 box.img
# vim metadata.json
{
"provider"     : "libvirt",
"format"       : "qcow2",
"virtual_size" : 40
}
# vim Vagrantfile
Vagrant.configure("2") do |config|
       config.vm.provider :libvirt do |libvirt|
       libvirt.driver = "kvm"
       libvirt.host = 'localhost'
       libvirt.uri = 'qemu:///system'
       end
config.vm.define "new" do |custombox|
       custombox.vm.box = "custombox"
       custombox.vm.provider :libvirt do |test|
       test.memory = 1024
       test.cpus = 1
       end
       end
end
# tar cvzf custom_box.box ./metadata.json ./Vagrantfile ./box.img

Testing

Add vagrant box via:

# vagrant box add custom_box.box --name "chuobi"
# vagrant init chuobi
# vagrant up --provider=libvirt

ThinkingOnDev

  1. 节点数据上报,监控客户端方案。
  2. 数据归总方案,用什么样的数据库什么样的架构用于存放数据。
  3. 数据展示方案,用什么样的前端和操控界面来展示和使用数据。