备忘：Proxmox Virtual Environment（PVE）7.x安装方法、基础使用和配置与常见问题

墨茶 4年前 6.1k次浏览 0条评论 10676字原创

写在前面

PVE的配置是个很枯燥无味的事情，也时常会出现各种奇奇怪怪的问题，很多时候一个相同的错误会犯好几次，为了尽可能的在提高效率的同时减少错误率，还是写一篇文章备忘吧。同时此文章或许也可以帮助到搜到此博客的有缘人，降低使用门槛。

很惭愧，只为社区做了一点微小的贡献。

此文章假设本机IP为192.168.1.101，子网掩码为255.255.255.0（CIDR /24），网关为192.168.1.1，以此假设为基础进行撰写，需要以root身份执行所有命令。

安装方法

此处仅列出基于Debian 11的安装方法，原系统只安装了“标准系统实用程序”和“SSH 服务器”且无桌面环境。其他版本或直接使用ISO安装可查看PVE wiki。下方列出的安装方法参考了Install Proxmox VE on Debian 11 Bullseye，是此页面的汉化版本，并进行了修改，更加人性化且适合中文母语者阅读，此处列出的每一步都是不可忽略的。

安装接下来可能需要的包。
```
apt install -y vim wget
```
为本机修改hostname。
```
echo yourhostname > /etc/hostname
```

为本机IP地址添加“/etc/hosts”条目（此处可能涉及到的vim快捷键：a(或i) -- 进入编辑模式 \ Esc -- 退出编辑模式 \ dd -- 删除光标所在行 \ u -- 撤销 \ ^r -- 回退(反撤销) \ :wq -- 保存并退出）。

vim /etc/hosts

#如果主机名为yourhostname，本机IP为192.168.1.101，那么需要添加（修改）的内容如下所示：
127.0.0.1 localhost
192.168.1.101 yourhostname yourhostname

#以下行适用于支持IPv6的主机：
::1 localhost ip6-localhost ip6-loopback 
ff02::1 ip6-allnodes 
ff02::2 ip6-allrouters

注意：这也意味着删除默认存在的地址127.0.1.1。

重启。
```
reboot
```

使用“hostname”命令验证2、3步操作是否正确。

hostname --ip-address

#此命令的输出应为本机IP地址，例如：
192.168.1.101

在“sources.list”中添加PVE存储库。

echo "deb [arch=amd64] http://download.proxmox.com/debian/pve bullseye pve-no-subscription" > /etc/apt/sources.list.d/pve-install-repo.list

将PVE存储库密钥添加到可信密钥。

wget https://enterprise.proxmox.com/debian/proxmox-release-bullseye.gpg -O /etc/apt/trusted.gpg.d/proxmox-release-bullseye.gpg

验证GPG密钥。

sha512sum /etc/apt/trusted.gpg.d/proxmox-release-bullseye.gpg

#输出内容应与下列字符串完全相同：
7fb03ec8a1675723d2853b84aa4fdb49a46a3bb72b9951361488bfd19b29aab0a789a4f8c7406e71a69aabbc727c936d3549731c4659ffa1a08f44db8fdcebfa  /etc/apt/trusted.gpg.d/proxmox-release-bullseye.gpg

更新存储库和系统。
```
apt update && apt full-upgrade
```
安装PVE软件包。如果配置较低，这可能需要相当长的时间。
```
apt install proxmox-ve postfix open-iscsi
```
如果网络中有邮件服务器，应该将postfix配置为“satellite system”。然后，现有的邮件服务器将成为relay host，它会将PVE发送的电子邮件路由到其最终收件人。如果不知道在此处输入什么，应选择“local only”并保持system name不变。安装过程中如出现提示“Configuring grub-pc”，选择“keep the local version currently installed”即可。
重启。
```
reboot
```

删除Debian内核。

apt remove linux-image-amd64 'linux-image-5.10*'

更新grub2。
```
update-grub
```
使用web界面（https://你的IP地址:8006/）管理PVE，必须使用https访问，选择PAM身份验证并使用root帐户凭据登录。
登录后，创建一个名为vmbr0的Linux网桥，将第一个网络接口的配置添加到其中并清空原接口内的配置，点击应用配置，如图所示。
重启。
```
reboot
```

基础使用和配置

创建虚拟机

Debian 11

咕咕咕。

Kali 2022.1

咕咕咕咕。

Windows Server 2019

咕咕咕咕咕。

Windows Server 2022

咕咕咕咕咕咕。

创建CT（LXC容器）

咕咕咕咕咕咕咕。

网络配置

IPv4 NAT

IPv4 NAT主要用于只有一个IP的服务器，IP数量大于1可以不进行设置，但需注意可以连接网络的虚拟机数量不能大于等于IP数量。

开启IPv4转发。

vim /etc/sysctl.conf

#需要添加以下内容：
net.ipv4.ip_forward = 1
net.ipv4.conf.all.forwarding = 1

#以下IPv6相关内容可不添加：
net.ipv6.conf.all.forwarding = 1
net.ipv6.conf.all.accept_ra=2
net.ipv6.conf.all.proxy_ndp = 1
net.ipv6.conf.all.autoconf=1

配置IPv4 NAT。添加内网网卡，将所有内网出流量转发到公网网卡（vmbr0），内网IP地址和掩码等信息可以自行修改。

vim /etc/network/interfaces

auto vmbr1
iface vmbr1 inet static
	address  10.0.0.254
	netmask  255.255.255.0
	bridge-ports none
	bridge-stp off
	bridge-fd 0

iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -o vmbr0 -j MASQUERADE

（推荐）为单IP配置端口转发（Port forwarding）。此处以内网设备10.0.0.1为例，将公网所有发送至80和443端口的TCP、UDP流量转发到到内网IP10.0.0.1的相同端口上。

iptables -t nat -A PREROUTING -i vmbr0 -d 192.168.1.101/24 -p tcp -m multiport --dports 80,443 -j DNAT --to 10.0.0.1
iptables -t nat -A PREROUTING -i vmbr0 -d 192.168.1.101/24 -p udp -m multiport --dports 80,443 -j DNAT --to 10.0.0.1

（推荐）如果执行了第3步，还需配置端口回流（Hairpin NAT），即解决内网设备无法通过本机公网IP访问内网服务的问题。将所有来源为内网网段、目标为本机公网IP且发送至80和443端口的TCP、UDP流量直接发送到内网IP10.0.0.1的相同端口上。
```
iptables -t nat -A PREROUTING -s 10.0.0.0/24 -d 192.168.1.101 -p tcp -m multiport --dports 80,443 -j DNAT --to-destination 10.0.0.1
iptables -t nat -A PREROUTING -s 10.0.0.0/24 -d 192.168.1.101 -p udp -m multiport --dports 80,443 -j DNAT --to-destination 10.0.0.1
```

将来源和目标均为内网网段的流量直接发送给内网网关。

iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -d 10.0.0.0/24 -j SNAT --to-source 10.0.0.254

让内核参数生效。
```
sysctl -p
```

使用“iptables-save”并搭配开机启动项将iptables nat表条目持久化。

iptables-save -t nat -c > /root/iptout-nat

vim /etc/init.d/iptnat

#需要添加以下内容：
#! /bin/bash

### BEGIN INIT INFO
# Provides:		iptnat
# Required-Start:	$all
# Required-Stop:	$local_fs $remote_fs $network $syslog
# Default-Start:	2 3 4 5
# Default-Stop:		0 1 6
# Description:       	add some rules to iptables nat table
# Short-Description:	iptadd
### END INIT INFO

nohup iptables-restore -w < /root/iptout-nat &

chmod 755 /etc/init.d/iptnat
update-rc.d iptnat defaults 0 0

重启并验证配置是否生效。
```
reboot
```
后续设置虚拟机的网络设备时，需桥接到内网网卡（vmbr1）并按照内网网卡配置进行配置，同一内网IP地址不能同时使用。如果正确按照上述步骤操作后无法正常转发流量，请使用以下命令检查FORWARD链的默认策略是否为ACCEPT：
```
iptables -nvL FORWARD
```
如果显示为DROP或其他，使用以下命令调整为ACCEPT。
```
iptables -P FORWARD ACCEPT
```

独立IPv4（非直通PCI）

咕咕咕。

DHCPv4

咕咕咕。

IPv6支持与SLAAC

咕咕咕。

常见问题

ipcc_send_rec[1] failed

错误描述

ipcc_send_rec[1] failed: Connection refused

解决方案

检查“/etc/hosts”是否按照教程正确配置。

rcu_sched detected stalls on cpus/tasks

错误描述

出现类似以下内核消息：

root@hostname:~# dmesg | grep rcu
[    0.131799] rcu: Hierarchical RCU implementation.
[    0.131800] rcu: 	RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=128.
[    0.131803] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.131804] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=128
[    0.150484] rcu: Hierarchical SRCU implementation.
[933576.698591] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[933576.698628] rcu: 	0-...0: (3 ticks this GP) idle=1dd/1/0x4000000000000000 softirq=10207288/10207290 fqs=3853 
[933576.698653] rcu: 	1-...0: (1 ticks this GP) idle=041/1/0x4000000000000000 softirq=7405299/7405300 fqs=3853 
[933576.698680] rcu: 	6-...0: (0 ticks this GP) idle=c85/1/0x4000000000000000 softirq=4956230/4956230 fqs=3853 
[933576.698701] rcu: 	10-...0: (2 GPs behind) idle=27f/1/0x4000000000000000 softirq=4599820/4599820 fqs=3853 
[933616.386427] rcu: rcu_sched kthread timer wakeup didn't happen for 4159 jiffies! g24493241 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
[933616.386484] rcu: 	Possible timer handling issue on cpu=2 timer-softirq=4117484
[933616.386498] rcu: rcu_sched kthread starved for 4160 jiffies! g24493241 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
[933616.386526] rcu: 	Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[933616.386546] rcu: RCU grace-period kthread stack dump:
[933616.386558] task:rcu_sched       state:I stack:    0 pid:   14 ppid:     2 flags:0x00004000
[933616.386589]  rcu_gp_fqs_loop+0xe5/0x320
[933616.386592]  rcu_gp_kthread+0xa7/0x130
[933616.386594]  ? rcu_gp_init+0x5f0/0x5f0
[933616.386603] rcu: Stack dump where RCU GP kthread last ran:
[933616.386632]  rcu_check_gp_kthread_starvation+0x163/0x17e
[933616.386635]  rcu_sched_clock_irq.cold+0x15b/0x383
[1453682.057819] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[1453682.063784] rcu: 	0-...!: (1 GPs behind) idle=b34/0/0x0 softirq=15917736/15917737 fqs=0 
[1453682.064288] rcu: 	1-...!: (0 ticks this GP) idle=142/0/0x0 softirq=11550576/11550576 fqs=0 
[1453682.064951] rcu: 	2-...!: (1 GPs behind) idle=c45/0/0x1 softirq=9633974/9633975 fqs=0 
[1453682.065401] rcu: 	3-...!: (1 GPs behind) idle=f72/0/0x0 softirq=8183440/8183441 fqs=0 
[1453682.065963] rcu: 	4-...!: (0 ticks this GP) idle=680/0/0x0 softirq=7184289/7184289 fqs=0 
[1453682.066589] rcu: 	5-...!: (0 ticks this GP) idle=b59/1/0x4000000000000000 softirq=6686625/6686625 fqs=0 
[1453682.067713] rcu: 	6-...!: (0 ticks this GP) idle=296/0/0x0 softirq=7714969/7714969 fqs=0 
[1453682.069060] rcu: 	7-...!: (1 ticks this GP) idle=37f/1/0x4000000000000000 softirq=6325737/6325738 fqs=0 
[1453682.070230] rcu: 	8-...!: (1 ticks this GP) idle=ed1/0/0x1 softirq=6209997/6209998 fqs=0 
[1453682.070814] rcu: 	9-...!: (1 ticks this GP) idle=0cf/0/0x1 softirq=6212942/6212943 fqs=0 
[1453682.071385] rcu: 	10-...!: (1 ticks this GP) idle=5fb/0/0x1 softirq=7168547/7168548 fqs=0 
[1453682.073622] rcu: 	11-...!: (1 ticks this GP) idle=b13/0/0x1 softirq=7033244/7033245 fqs=0 
[1453682.075357]  rcu_dump_cpu_stacks+0x13c/0x177
[1453682.075390]  rcu_sched_clock_irq.cold+0x56/0x383
[2063945.946855] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[2063945.951611] rcu: 	0-...!: (1 GPs behind) idle=84b/0/0x1 softirq=22700106/22700107 fqs=0 
[2063945.952330] rcu: 	1-...!: (1 GPs behind) idle=967/1/0x4000000000000000 softirq=16840589/16840589 fqs=0 
[2063945.957670] rcu: 	2-...!: (1 GPs behind) idle=80f/1/0x4000000000000000 softirq=14119732/14119734 fqs=0 
[2063945.958811] rcu: 	3-...!: (0 ticks this GP) idle=421/1/0x4000000000000000 softirq=12073274/12073274 fqs=0 
[2063945.959904] rcu: 	4-...!: (1 GPs behind) idle=2c3/1/0x4000000000000000 softirq=10666617/10666620 fqs=0 
[2063945.961128] rcu: 	5-...!: (1 GPs behind) idle=769/1/0x4000000000000000 softirq=9941370/9941372 fqs=0 
[2063945.962285] rcu: 	6-...!: (1 GPs behind) idle=955/1/0x4000000000000000 softirq=11339018/11339018 fqs=0 
[2063945.963532] rcu: 	7-...!: (1 GPs behind) idle=3fb/1/0x4000000000000000 softirq=9458962/9458966 fqs=0 
[2063945.964829] rcu: 	8-...!: (1 GPs behind) idle=7d3/1/0x4000000000000000 softirq=9281433/9281437 fqs=1 
[2063945.966305] rcu: 	9-...!: (1 GPs behind) idle=665/1/0x4000000000000000 softirq=9260186/9260189 fqs=1 
[2063945.967617] rcu: 	10-...!: (1 GPs behind) idle=57f/1/0x4000000000000000 softirq=10581714/10581717 fqs=1 
[2063945.969002] rcu: 	11-...!: (1 GPs behind) idle=395/1/0x4000000000000000 softirq=10414152/10414155 fqs=1 
[2063945.970761]  irq_exit_rcu+0x94/0xc0
[2063945.975009]  rcu_dump_cpu_stacks+0x13c/0x177
[2063945.975030]  rcu_sched_clock_irq.cold+0x56/0x383

解决方案

根据github报告和bugzilla报告，此问题多年来广泛影响Proxmox和其他KVM用户，但Proxmox开发人员无法解决，甚至无法重现，但可以将所有KVM的SCSI控制器设置为VirtIO SCSI single，并将其硬盘IO thread打开，异步IO设置为threads解决此问题。

节点或虚拟机状态为未知或离线

错误描述

web管理界面出现问题，所有节点和在其上运行的所有VM/容器都显示为“unknown”状态（或是“offline”状态）。

解决方案

systemctl restart pvedaemon
systemctl restart pveproxy
systemctl restart pvestatd

如果状态显示仍然异常，请尝试重启宿主机。

虚拟机无法关机和重启

错误描述

新创建的虚拟机在主控端向虚拟机发送关机/重启命令后，命令长时间显示执行中（状态一直为“running”），但虚拟机并没有执行关机/重启命令，最后报错：

TASK ERROR: VM quit/powerdown failed

解决方案

此问题共三种解决方案，根据适合自己的方案任选其一即可。

安装QEMU Guest Agent服务

如果在虚拟机选项中启用了QEMU Guest Agent，那么需要为虚拟机安装此管理软件才能解决此问题。此处以Debian为例，其他发行版应使用自己的依赖管理工具，例如Centos需使用“yum”。此外，Windows还需要安装“VirtIO”驱动。
```
apt install qemu-guest-agent -y
```
安装后启动服务。
```
systemctl start qemu-guest-agent
```
启动后验证服务是否正常启动。
```
systemctl status qemu-guest-agent
```
状态为active(running)即为正常，或使用：
```
ps aux | grep qemu
```
能找到qemu-ga即为正常。

关闭QEMU Guest Agent选项

在虚拟机选项中禁用QEMU Guest Agent，禁用后将使用ACPI信号关闭虚拟机。

使用停止命令（强制关机）

不推荐此方法，因为可能损坏虚拟机文件系统造成数据丢失或其他更为严重的后果。

在web管理界面中的虚拟机关机按钮右侧下拉框中点击停止按钮。

或在shell中输入停止命令来停止虚拟机，此命令中的vmid是需要关闭的虚拟机的ID。

qm stop vmid

如果执行此命令后报错：

trying to acquire lock...

TASK ERROR: can't lock file '/var/lock/qemu-server/lock-vmid.conf' - got timeout

那么需要先删除报错中提及的锁，然后重新执行停止命令。

rm -f /var/lock/qemu-server/lock-vmid.conf
qm stop vmid

宿主机频繁崩溃（重启）

错误描述

宿主机经常在持续运行几小时到几天后崩溃（重启），“uptime”系统在线时间被刷新、“last reboot”显示有多个条目still running（参考：Debian - two entries in `last reboot` in `still running`）、“/var/log/syslog”和“/var/log/messages”均未记录任何异常或记录了类似于以下内容的日志：

kernel: [92961.930152] watchdog: BUG: soft lockup - CPU#1 stuck for 52s! [swapper/1:0]
kernel: [92970.861930] watchdog: BUG: soft lockup - CPU#7 stuck for 45s! [kvm:1462]
kernel: [92970.861933] watchdog: BUG: soft lockup - CPU#8 stuck for 30s! [kworker/u256:2:199604]

在崩溃重启前也可能存在一段时间日志记录中断。

解决方案

在PVE中出现这种问题的通常原因是由于watchdog检测到IO太慢或卡死，认为系统出现异常才使服务器崩溃重启以尝试解决，需要尽快调查硬件方面是否存在问题，或是系统负载是否过高、硬件是否过于老旧并考虑进行硬件升级，尤其是需要监测RAID卡和硬盘是否存在问题。在具有大量CPU核心的系统上，也可能并不表示有任何问题。

此外，此现象在PVE正处于虚拟机内运行（即嵌套虚拟化）时尤为常见。

可尝试提高watchdog_thresh来增加触发软锁定之前的时间以缓解此问题，此内核参数的值默认为10：

vim /etc/sysctl.conf

#需要添加以下内容：
kernel.watchdog_thresh = 60

也可以临时使用以下命令停止watchdog，但不应该因为此问题永久停止watchdog：

systemctl stop watchdog-mux

本文为原创文章，除非在文章中另有说明，本文章采用CC BY 4.0协议授权，转载请注明来源。

1 打赏