通过 Ansible 给各个 Server 做自动化升级

一直想做但是一直没做,终于断断续续做好了。

0x00 前言

目前在我的 homelab 内,我是将各种服务分散放置的——也就是有些服务在 Proxmox 的 LXC 内,有些服务在 VM 内。同时,在互联网上我也有几台 VPS,分别部署了不同的服务。这就带来了一个不可避免的问题——更新怎么办?手动登录每一台设备然后进行更新不仅很机械劳动,而且一旦 mirror 突然因为网络问题导致速度不稳定的话,非常费时间。久而久之我也感觉烦了,每周更新又很烦人,每月更新的话累积到一起积少成多了又很花时间。

那么该怎么办呢?搜了一圈最终打算使用 Ansible。它不仅开源,而且部署也很方便,主流的 Linux 发行版也都支持。不过,它只能承担日常的小版本升级,而向 Ubuntu 24.04 –> Ubuntu 26.04 这种大版本升级就不能靠它了。不过这种大版本升级肯定也是要人工介入的吧(想起来上周六对几台 vps 这么做因为性能比较差花了我 1-2 小时,还有一台差点炸了)。

最终我的想法是,内网一台 VM 当内部主控,然后这台机器能够更新其它的机器和海外的 VPS。但海外的 VPS 会遇到网络问题,所以同时在某台我也能比较流畅的访问的 VPS 上,也部署一套,但专门负责更新海外其它 VPS。同时,让它不仅能够控制开机的服务,在 homelab 内还能控制当前关机的机器自动开机、更新之后再关机(因为我都放在 Proxmox 内)。

0x01 在主控节点安装依赖包

因为我的内部主控节点是 Fedora,所以很好安装。但 Fedora 并没有打包 Proxmox API 所需要的依赖,所以需要 virtualenv 起一个虚拟环境,这样不至于破坏系统级别的 Python 包。

sudo dnf update
sudo dnf install -y ansible python3 python3-pip openssh-clients
python3 -m venv ~/.venvs/ansible-proxmox
~/.venvs/ansible-proxmox/bin/python -m pip install --upgrade pip
~/.venvs/ansible-proxmox/bin/python -m pip install proxmoxer requests
ansible-galaxy collection install community.proxmox community.general

而我的海外主控 VPS 是 Ubuntu,这里为了懒,直接用的 apt 源。26.04 提供的版本在我做这个的时候还算新。

sudo apt update
sudo apt install -y ansible python3-proxmoxer python3-requests python3-apt openssh-client
ansible-galaxy collection install community.proxmox

0x02 Ansible 配置

在这里我只放内部主控节点怎么配置的,海外主控 VPS 只是去掉了我 homelab 的部分。

目录结构

ansible.cfg 作为配置文件在根目录,inventory 文件夹里面同时放我 homelab 内 Proxmox 的配置、海外 VPS 的配置和相关变量。

/srv/infra-patch/
├── ansible.cfg
├── inventory/
│   ├── proxmox.proxmox.yml
│   ├── vps.yml
│   ├── group_vars/
│   │   ├── all/
│   │   │   └── proxmox_api.yml
│   │   ├── tag_linux.yml
│   │   ├── vps.yml
│   │   ├── vps_us.yml
│   │   ├── vps_eu.yml
│   │   └── vps_asia.yml
│   └── host_vars/
│       ├── localhost.yml
│       ├── nextcloud.yml
│       ├── jellyfin.yml
│       ├── ...
├── playbooks/
│   ├── patch_running.yml
│   ├── patch_stopped.yml
│   ├── patch_vps.yml
│   └── patch_vps_controllers.yml
└── requirements.yml

Ansible 配置文件

[defaults]
inventory = ./inventory
host_key_checking = True
forks = 5
timeout = 30
interpreter_python = auto_silent
stdout_callback = ansible.builtin.default
callback_result_format = yaml
callback_result_indentation = 4
vault_password_file = /home/<your username>/.config/ansible/proxmox.vault_pass

[ssh_connection]
pipelining = True

其中的 vault_password_file 是一个本地只读的密码文件(chmod 600),这里只有 Vault 的密码,Ansible 用它来解密 Vault 的内容。它不是 SSH 密钥也不是 Proxmox token secret,而是”解密你那些被 Vault 加密过的配置”的密码。如果两台节点共享同一套 Vault 文件的话,这个密码内容也要一样。

inventory 文件夹

Proxmox 配置

在此之前,我们需要去 Proxmox 那边创建一个 API Tokens。选择 Datacenter –> Permissions –> API Tokens,添加 Token。弹出的窗口中,用户名选择 root@pam,TokenID 写 ansible,勾选 Privilege Separation。此时会有一个 Token Secret,一定要记下来。

然后回到 Permissions,点击 Add,选择 API Token Permission。弹出的窗口中,Path 选择 /,API Token 选择刚刚创建的 root@pam!ansible。因为 Role 一次只能添加一个,所以需要添加两次,我们要分别添加 PVEAuditorPVEVMUser。后者是为了开关 VM/LXC,当然,你自己创建一个只有 VM.PowerMgmt 的 Role 也行。

之后,我们需要在配置文件内加密这个 Secret。还记得刚刚的 vault_password_file 吗?我们要用到它了。

ansible-vault encrypt_string \
  --vault-password-file /home/<your username>/ansible/proxmox.vault_pass \
  '<MyProxmoxTokenSecret>' \
  --name token_secret

这个命令会输出这样一段内容:

token_secret: !vault |
          $ANSIBLE_VAULT;1.1;AES256
          6236393966...

我们要的就是这个内容!接下来创建对应的 Proxmox 配置即可:

# /srv/infra-patch/inventory/proxmox.proxmox.yml
plugin: community.proxmox.proxmox
url: "https://<PVE IP>:8006"
user: "root@pam"
token_id: "ansible"
token_secret: !vault |
          $ANSIBLE_VAULT;1.1;AES256
          <YOUR_ENCRYPT_DATA_HERE>
validate_certs: false

want_facts: true
want_proxmox_nodes_ansible_host: false

keyed_groups:
  - key: proxmox_tags_parsed
    prefix: "tag_"
    separator: ""

compose:
  ansible_host: >-
    (
      proxmox_lxc_interfaces
      | default([])
      | rejectattr('name', 'equalto', 'lo')
      | map(attribute='inet')
      | select('defined')
      | map('regex_replace', '/.*', '')
      | list
      | first
    )

同样的,我们还需要一个 Proxmo API 的变量文件,这个是给后面的 playbook 用的。同样的,编写完记得用 ansible-vault encrypt /srv/infra-patch/inventory/group_vars/all/proxmox_api.yml 指令加密。

# /srv/infra-patch/inventory/group_vars/all/proxmox_api.yml
proxmox_api_host: 10.0.10.228
proxmox_api_user: root@pam
proxmox_api_token_id: ansible
proxmox_api_token_secret: 这里填真实token
proxmox_validate_certs: false

各个机器配置

在 Fedora 内网主控上,我给 localhost 也写了单独的 host_vars 文件,因为开启已关机的 LXC 和 VM 需要在本机调用 Proxmox API 模块(这就是为啥前面要起一个 venv)才行。

# /srv/infra-patch/inventory/host_vars/localhost.yml
ansible_connection: local
ansible_python_interpreter: /home/<your username>/.venvs/ansible-proxmox/bin/python

接下来,虽然 Ansible 对于 LXC 可以通过 Proxmox 的 API 看到容器的 IP,但原本已经关机的机器,或者没有装 qemu-guest-agent 的 VM,Ansible 可能都不能发现 IP。不过在这里,因为我的各个服务都在内网设置了静态 IP,所以对每台机器,都在 host_vars 内创建对应的 YAML 变量文件:

# /srv/infra-patch/inventory/host_vars/nextcloud.yml
ansible_host: 10.0.10.107
ansible_user: ansible
ansible_ssh_private_key_file: /home/<your username>/.ssh/id_rsa

可以发现,我是使用 SSH Key 来验证的。因为我的机器都预先配置了 SSH Key only 登录,所以我需要每一台机器都跑一遍创建对应的用户脚本:

sudo useradd -m -s /bin/bash -G sudo ansible
sudo mkdir -p /home/ansible/.ssh
sudo chmod 700 /home/ansible/.ssh
# 这里记得把你的 public Key 灌进去
printf '%s\n' '$PUBKEY' | sudo tee /home/ansible/.ssh/authorized_keys >/dev/null
sudo chown -R ansible:ansible /home/ansible/.ssh
sudo chmod 600 /home/ansible/.ssh/authorized_keys
echo 'ansible ALL=(ALL) NOPASSWD:ALL' | sudo tee /etc/sudoers.d/90-ansible >/dev/null
sudo chmod 440 /etc/sudoers.d/90-ansible
sudo visudo -cf /etc/sudoers.d/90-ansible

注意,如果是 Fedora 系,sudo 组要换成 wheel

海外各 VPS 配置

这里我最终写成了静态 YAML,因为多半不会变。在这里,我给海外主控 VPS 增加了 tag_controller,这样平时运行升级海外 VPS 的时候会自动排除它,如果我想从内网主控更新这台机器的话,会有单独的配置对他进行升级。

# /srv/infra-patch/inventory/vps.yml
all:
  children:
    vps:
      children:
        vps_oa:
          hosts:
            oa-01:
              ansible_host: x.x.x.x

        vps_asia:
          hosts:
            asia-01:
              ansible_host: x.x.x.x
            asia-02:
              ansible_host: x.x.x.x
            asia-03:
              ansible_host: x.x.x.x

        vps_us:
          hosts:
            us-01:
              ansible_host: x.x.x.x

        vps_eu:
          hosts:
            eu-01:
              ansible_host: x.x.x.x

        vps_af:
          hosts:
            af-01:
              ansible_host: x.x.x.x

    tag_linux:
      children:
        tag_autopatch:
          children:
            tag_ubuntu:
              hosts:
                oa-01:
                asia-01:
                asia-02:
                asia-03:
                us-01:
                eu-01:
                af-01:

    tag_controller:
      hosts:
        oa-01:

同样的,我们也需要一个变量文件。这个我也写得简单,只不过对一些通过我 homelab 直接连过去因为线路问题比较慢的 VPS,我单独会对一些机器加一些参数。Ansible 在高延迟的环境里依然是可以用的,只是要把并发调低。

# /srv/infra-patch/inventory/group_vars/vps.yml
ansible_user: ansible
ansible_become: true
ansible_ssh_private_key_file: /home/<your username>/.ssh/id_rsa
ansible_connection: ssh
# /srv/infra-patch/inventory/group_vars/vps_af.yml
ansible_ssh_common_args: "-o ServerAliveInterval=30 -o ServerAliveCountMax=6"

playbooks 文件夹

我在 Proxmox 内,给所有需要进行自动更新的机器都加了 autopatch 这个标签。

只更新运行中的 LXC 和 VM

这部分很简单,逻辑就是,只对有 autopatch 标签且正在运行的机器们升级。

# /srv/infra-patch/playbooks/patch_running.yml
- name: Patch running Linux guests
  hosts: "tag_autopatch:&proxmox_all_running"
  serial: "20%"
  gather_facts: true
  become: true
  max_fail_percentage: 20

  tasks:
    - name: Debian/Ubuntu | refresh apt cache
      ansible.builtin.apt:
        update_cache: true
        cache_valid_time: 3600
        update_cache_retries: 5
        update_cache_retry_max_delay: 12
        lock_timeout: 300
      when: ansible_facts['os_family'] == 'Debian'

    - name: Debian/Ubuntu | dist-upgrade
      ansible.builtin.apt:
        upgrade: dist
        autoremove: true
        clean: true
        dpkg_options: "force-confdef,force-confold"
        lock_timeout: 300
      when: ansible_facts['os_family'] == 'Debian'

    - name: Debian/Ubuntu | reboot required marker
      ansible.builtin.stat:
        path: /var/run/reboot-required
      when: ansible_facts['os_family'] == 'Debian'
      register: deb_reboot_required

    - name: Fedora/RHEL | upgrade installed packages
      ansible.builtin.dnf:
        name: "*"
        state: latest
        update_only: true
        update_cache: true
      when: ansible_facts['os_family'] == 'RedHat'

    - name: openSUSE | dist-upgrade
      community.general.zypper:
        name: "*"
        state: dist-upgrade
        update_cache: true
      when: ansible_facts['os_family'] == 'Suse'

    - name: Print reboot hint for Debian/Ubuntu
      ansible.builtin.debug:
        msg: "reboot required on {{ inventory_hostname }}"
      when:
        - ansible_facts['os_family'] == 'Debian'
        - deb_reboot_required.stat.exists | default(false)

只更新已关机的 LXC 和 VM

这部分就有点绕了。

首先,这些机器未必能自动拿到 IP(LXC 可以,VM 是多半不太行的),并且需要做开关机的操作,所以部分操作要在控制节点本身上通过 Proxmox API 执行。

其次,这些机器一旦开机,在 SSH 主机指纹较旧的情况下,可能会先被 Host key verification failed 这个错误直接卡到 timeout,所以要处理 known_hosts。这样,逻辑就变成了:

localhost 上构建一个只对有 autopatch 标签且已经关机的机器们的组,然后按 LXC 和 VM 分别启动,接着刷新 inventory,再等 TCP 22 端口通,通了之后,在主控上删除对应的旧 host key,重新 ssh-keyscan 写入 known_hosts,最后再去跑升级,完成之后再关机。

# /srv/infra-patch/playbooks/patch_stopped.yml
- name: Build groups for autopatch hosts that were initially stopped
  hosts: localhost
  gather_facts: false
  vars:
    autopatch_hosts: "{{ groups['tag_autopatch'] | default([]) }}"
    stopped_hosts: "{{ groups['proxmox_all_stopped'] | default([]) }}"
    stopped_lxc: "{{ groups['proxmox_all_lxc'] | default([]) }}"
    stopped_qemu: "{{ groups['proxmox_all_qemu'] | default([]) }}"
  tasks:
    - name: Add initially stopped autopatch LXC hosts
      ansible.builtin.add_host:
        name: "{{ item }}"
        groups:
          - patchable_stopped
          - patchable_stopped_lxc
      loop: "{{ autopatch_hosts | intersect(stopped_hosts) | intersect(stopped_lxc) | unique }}"

    - name: Add initially stopped autopatch QEMU hosts
      ansible.builtin.add_host:
        name: "{{ item }}"
        groups:
          - patchable_stopped
          - patchable_stopped_qemu
      loop: "{{ autopatch_hosts | intersect(stopped_hosts) | intersect(stopped_qemu) | unique }}"

- name: Start initially stopped LXC guests
  hosts: patchable_stopped_lxc
  gather_facts: false
  serial: 1
  tasks:
    - name: Start LXC
      delegate_to: localhost
      community.proxmox.proxmox:
        api_host: "{{ proxmox_api_host }}"
        api_user: "{{ proxmox_api_user }}"
        api_token_id: "{{ proxmox_api_token_id }}"
        api_token_secret: "{{ proxmox_api_token_secret }}"
        validate_certs: "{{ proxmox_validate_certs }}"
        vmid: "{{ proxmox_vmid }}"
        node: "{{ proxmox_node }}"
        state: started

- name: Start initially stopped QEMU guests
  hosts: patchable_stopped_qemu
  gather_facts: false
  serial: 1
  tasks:
    - name: Start QEMU VM
      delegate_to: localhost
      community.proxmox.proxmox_kvm:
        api_host: "{{ proxmox_api_host }}"
        api_user: "{{ proxmox_api_user }}"
        api_token_id: "{{ proxmox_api_token_id }}"
        api_token_secret: "{{ proxmox_api_token_secret }}"
        validate_certs: "{{ proxmox_validate_certs }}"
        vmid: "{{ proxmox_vmid }}"
        node: "{{ proxmox_node }}"
        state: started

- name: Refresh inventory after starting guests
  hosts: localhost
  gather_facts: false
  tasks:
    - ansible.builtin.meta: refresh_inventory

- name: Wait for SSH port and refresh host keys for initially stopped guests
  hosts: patchable_stopped
  gather_facts: false
  serial: 1
  tasks:
    - name: Wait for TCP/22 to open
      delegate_to: localhost
      ansible.builtin.wait_for:
        host: "{{ ansible_host }}"
        port: 22
        timeout: 900
        sleep: 5
        delay: 2

    - name: Remove old host key from controller known_hosts
      delegate_to: localhost
      become: false
      ansible.builtin.command:
        cmd: "ssh-keygen -R {{ ansible_host }}"
      changed_when: false
      failed_when: false

    - name: Add current host key to controller known_hosts
      delegate_to: localhost
      become: false
      ansible.builtin.shell: >
        ssh-keyscan -H {{ ansible_host }} >> {{ lookup('env', 'HOME') }}/.ssh/known_hosts
      args:
        executable: /bin/bash
      changed_when: false

    - name: Wait for SSH login to become ready
      ansible.builtin.wait_for_connection:
        timeout: 300
        sleep: 5

- name: Patch guests that were initially stopped
  hosts: patchable_stopped
  gather_facts: true
  become: true
  serial: 1
  max_fail_percentage: 20

  tasks:
    - name: Debian/Ubuntu | refresh apt cache
      ansible.builtin.apt:
        update_cache: true
        cache_valid_time: 3600
        update_cache_retries: 5
        update_cache_retry_max_delay: 12
        lock_timeout: 300
      when: ansible_facts['os_family'] == 'Debian'

    - name: Debian/Ubuntu | dist-upgrade
      ansible.builtin.apt:
        upgrade: dist
        autoremove: true
        clean: true
        dpkg_options: "force-confdef,force-confold"
        lock_timeout: 300
      when: ansible_facts['os_family'] == 'Debian'

    - name: Debian/Ubuntu | reboot required marker
      ansible.builtin.stat:
        path: /var/run/reboot-required
      when: ansible_facts['os_family'] == 'Debian'
      register: deb_reboot_required

    - name: Fedora/RHEL | upgrade installed packages
      ansible.builtin.dnf:
        name: "*"
        state: latest
        update_only: true
        update_cache: true
      when: ansible_facts['os_family'] == 'RedHat'

    - name: openSUSE | dist-upgrade
      community.general.zypper:
        name: "*"
        state: dist-upgrade
        update_cache: true
      when: ansible_facts['os_family'] == 'Suse'

    - name: Print reboot hint for Debian/Ubuntu
      ansible.builtin.debug:
        msg: "reboot required on {{ inventory_hostname }}"
      when:
        - ansible_facts['os_family'] == 'Debian'
        - deb_reboot_required.stat.exists | default(false)

- name: Stop LXC that were initially stopped
  hosts: patchable_stopped_lxc
  gather_facts: false
  serial: 1
  tasks:
    - name: Stop LXC
      delegate_to: localhost
      community.proxmox.proxmox:
        api_host: "{{ proxmox_api_host }}"
        api_user: "{{ proxmox_api_user }}"
        api_token_id: "{{ proxmox_api_token_id }}"
        api_token_secret: "{{ proxmox_api_token_secret }}"
        validate_certs: "{{ proxmox_validate_certs }}"
        vmid: "{{ proxmox_vmid }}"
        node: "{{ proxmox_node }}"
        state: stopped

- name: Stop QEMU VMs that were initially stopped
  hosts: patchable_stopped_qemu
  gather_facts: false
  serial: 1
  tasks:
    - name: Stop QEMU VM
      delegate_to: localhost
      community.proxmox.proxmox_kvm:
        api_host: "{{ proxmox_api_host }}"
        api_user: "{{ proxmox_api_user }}"
        api_token_id: "{{ proxmox_api_token_id }}"
        api_token_secret: "{{ proxmox_api_token_secret }}"
        validate_certs: "{{ proxmox_validate_certs }}"
        vmid: "{{ proxmox_vmid }}"
        node: "{{ proxmox_node }}"
        state: stopped

这个的缺点是,无法预先 --check,因为等待连接和开关机这种不支持 check mode。如果真的要测试,最好的办法是真的用 --limit 'localhost:<MachineName>' 只跑一台机器,而不是批量所有都直接开跑。

海外 VPS 方案

更新海外其它 VPS

这里,我们排除了 tag_controller,因为我们不想升级海外的主控 VPS。同样的,在海外主控 VPS 上,我们也是只跑这份 playbook。

# /srv/infra-patch/playbooks/patch_vps.yml
- name: Patch overseas VPS
  hosts: "tag_autopatch:&vps:!tag_controller"
  serial: 1
  gather_facts: true
  become: true
  max_fail_percentage: 20

  tasks:
    - name: Debian/Ubuntu | refresh apt cache
      ansible.builtin.apt:
        update_cache: true
        cache_valid_time: 3600
        update_cache_retries: 5
        update_cache_retry_max_delay: 12
        lock_timeout: 300
      when: ansible_facts['os_family'] == 'Debian'

    - name: Debian/Ubuntu | dist-upgrade
      ansible.builtin.apt:
        upgrade: dist
        autoremove: true
        clean: true
        dpkg_options: "force-confdef,force-confold"
        lock_timeout: 300
      when: ansible_facts['os_family'] == 'Debian'

    - name: Reboot required marker
      ansible.builtin.stat:
        path: /var/run/reboot-required
      when: ansible_facts['os_family'] == 'Debian'
      register: deb_reboot_required

    - name: Print reboot hint
      ansible.builtin.debug:
        msg: "reboot required on {{ inventory_hostname }}"
      when:
        - ansible_facts['os_family'] == 'Debian'
        - deb_reboot_required.stat.exists | default(false)

更新海外主控 VPS

从 homelab 升级海外主控 VPS 的时候,只需要选出来 tag_controller:&vps 即可。

# /srv/infra-patch/playbooks/patch_vps_controllers.yml
- name: Patch external controller VPS
  hosts: "tag_controller:&vps"
  serial: 1
  gather_facts: true
  become: true
  max_fail_percentage: 20

  tasks:
    - name: Debian/Ubuntu | refresh apt cache
      ansible.builtin.apt:
        update_cache: true
        cache_valid_time: 3600
        update_cache_retries: 5
        update_cache_retry_max_delay: 12
        lock_timeout: 300
      when: ansible_facts['os_family'] == 'Debian'

    - name: Debian/Ubuntu | dist-upgrade
      ansible.builtin.apt:
        upgrade: dist
        autoremove: true
        clean: true
        dpkg_options: "force-confdef,force-confold"
        lock_timeout: 300
      when: ansible_facts['os_family'] == 'Debian'

    - name: Reboot required marker
      ansible.builtin.stat:
        path: /var/run/reboot-required
      when: ansible_facts['os_family'] == 'Debian'
      register: deb_reboot_required

    - name: Show reboot hint
      ansible.builtin.debug:
        msg: "reboot required on {{ inventory_hostname }}"
      when:
        - ansible_facts['os_family'] == 'Debian'
        - deb_reboot_required.stat.exists | default(false)

0x03 能不能更方便?

确实可以更方便。因为每想执行一次升级,我都要这样:

cd /srv/infra-patch/
ansible-playbook playbooks/patch_running.yml -i inventory/proxmox.proxmox.yml

更懒的做法,当然就是写个脚本,然后放到家目录下。

其中,我只把更新已停止服务的默认改成了 run,其它的默认都是 check。最后就可以直接这个样子:

# 先检查配置文件有没有问题,测试升级是什么样子
~/bin/run-proxmox-running-updates check
# 再开始正式跑
~/bin/run-proxmox-running-updates run
# 通常既然都关机了,那么可能是不重要的服务
# 或者节省资源才关的,不太需要所有已关闭的都要去维护
# 所以通常我用它单个来跑
~/bin/run-proxmox-stopped-updates run --limit 'localhost:bangumi'
# 同样的,先检查配置文件有没有问题,测试升级是什么样子
~/bin/run-vps-updates check
# 在开始正式跑
~/bin/run-vps-updates run

更新 VPS

#!/usr/bin/env bash
set -euo pipefail

BASE="/srv/infra-patch"
PLAYBOOK="$BASE/playbooks/patch_vps.yml"
INVENTORY="$BASE/inventory/vps.yml"
LOGDIR="$HOME/logs/ansible"
LOCKFILE="/tmp/run-vps-updates.lock"

mkdir -p "$LOGDIR"

MODE="${1:-check}"
shift || true

case "$MODE" in
  check)
    EXTRA_ARGS=(--check --diff)
    ;;
  run)
    EXTRA_ARGS=()
    ;;
  *)
    echo "Usage: $0 [check|run] [extra ansible-playbook args...]"
    exit 1
    ;;
esac

export ANSIBLE_CONFIG="$BASE/ansible.cfg"
export ANSIBLE_FORCE_COLOR=1
export PY_COLORS=1
unset NO_COLOR

cd "$BASE"

exec flock -n "$LOCKFILE" \
  ansible-playbook "$PLAYBOOK" -i "$INVENTORY" "${EXTRA_ARGS[@]}" "$@" \
  2>&1 | tee -a "$LOGDIR/vps-$(date +%F).log"

更新运行中的服务

#!/usr/bin/env bash
set -euo pipefail

BASE="/srv/infra-patch"
PLAYBOOK="$BASE/playbooks/patch_running.yml"
INVENTORY="$BASE/inventory/proxmox.proxmox.yml"
LOGDIR="$HOME/logs/ansible"
LOCKFILE="/tmp/run-proxmox-running-updates.lock"

mkdir -p "$LOGDIR"

MODE="${1:-check}"
shift || true

case "$MODE" in
  check)
    EXTRA_ARGS=(--check --diff)
    ;;
  run)
    EXTRA_ARGS=()
    ;;
  *)
    echo "Usage: $0 [check|run] [extra ansible-playbook args...]"
    exit 1
    ;;
esac

export ANSIBLE_CONFIG="$BASE/ansible.cfg"
export ANSIBLE_FORCE_COLOR=1
export PY_COLORS=1
unset NO_COLOR

cd "$BASE"

exec flock -n "$LOCKFILE" \
  ansible-playbook "$PLAYBOOK" -i "$INVENTORY" "${EXTRA_ARGS[@]}" "$@" \
  2>&1 | tee -a "$LOGDIR/proxmox-running-$(date +%F).log"

更新已停止的服务

#!/usr/bin/env bash
set -euo pipefail

BASE="/srv/infra-patch"
PLAYBOOK="$BASE/playbooks/patch_stopped.yml"
INVENTORY="$BASE/inventory/proxmox.proxmox.yml"
LOGDIR="$HOME/logs/ansible"
LOCKFILE="/tmp/run-proxmox-stopped-updates.lock"

mkdir -p "$LOGDIR"

MODE="${1:-run}"
shift || true

case "$MODE" in
  run)
    EXTRA_ARGS=()
    ;;
  *)
    echo "Usage: $0 run [extra ansible-playbook args...]"
    exit 1
    ;;
esac

export ANSIBLE_CONFIG="$BASE/ansible.cfg"
export ANSIBLE_FORCE_COLOR=1
export PY_COLORS=1
unset NO_COLOR

cd "$BASE"

exec flock -n "$LOCKFILE" \
  ansible-playbook "$PLAYBOOK" -i "$INVENTORY" "${EXTRA_ARGS[@]}" "$@" \
  2>&1 | tee -a "$LOGDIR/proxmox-stopped-$(date +%F).log"

大功告成。只需要开个 mosh 然后挂后台让它自己升级,剩下的时间就是自己的了。

0x04 参考资料


如果喜欢本文,欢迎点击下方的「鼓掌」按钮!

如果上面没有加载出任何东西,可以点击这里