Jan 5, 2021
TechnologyHardware
320G usb disk, laptop(running archlinux already).
Steps
fdisk
the usb disk and create with following partitions:
$ sudo fdisk -l /dev/sdc
Disk /dev/sdc:298.09 GiB,320072933376 字节,625142448 个扇区
磁盘型号:Storage
单元:扇区 / 1 * 512 = 512 字节
扇区大小(逻辑/物理):512 字节 / 512 字节
I/O 大小(最小/最佳):512 字节 / 512 字节
磁盘标签类型:dos
磁盘标识符:0x112a2f3d
设备 启动 起点 末尾 扇区 大小 Id 类型
/dev/sdc1 2048 1050623 1048576 512M ef EFI (FAT-12/16/32)
/dev/sdc2 1050624 625142447 624091824 297.6G 83 Linux
Format the disk:
$ $ sudo mkfs.fat -F32 /dev/sdc1
mkfs.fat 4.1 (2017-01-24)
$ sudo mkfs.ext4 /dev/sdc2
Install arch-install-scripts
on archlinux. Then mount the disk to install point:
$ sudo mount /dev/sdc2 /mnt
$ sudo mkdir -p /mnt/boot
$ sudo mount /dev/sdc1 /mnt/boot
Now use pacstrap
for installing basic system onto usb disk:
$ sudo pacstrap -c /mnt base linux linux-firmware base-devel
Generate /etc/fstab
:
# genfstab -U /mnt >> /mnt/etc/fstab
# vim /mnt/etc/fstab
comment the swap partition
chroot into /mnt:
# arch-chroot /mnt
# ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
# pacman -S vim
# vim /etc/locale.gen
en_US.UTF-8 UTF-8
en_US ISO-8859-1
zh_CN.GB18030 GB18030
zh_CN.GBK GBK
zh_CN.UTF-8 UTF-8
zh_CN GB2312
# locale-gen
# vim /etc/locale.conf
LANG=en_US.UTF-8
# vim /etc/hostname
archusb
# vim /etc/hosts
# Static table lookup for hostnames.
# See hosts(5) for details.
127.0.0.1 localhost
::1 localhost
127.0.1.1 archusb
# pacman -S net-tools tcpdump iotop dhcpcd openssh dosfstools ntfs-3g amd-ucode intel-ucode grub efibootmgr
# systemctl enable sshd
# cat /etc/mkinitcpio.conf | grep block
# HOOKS=(base udev autodetect block filesystems)
# HOOKS=(base udev block filesystems)
# HOOKS=(base udev block mdadm encrypt filesystems)
# HOOKS=(base udev block lvm2 filesystems)
HOOKS=(base udev block keyboard autodetect modconf filesystems fsck)
# mkinitcpio -P
# passwd
Make grub configuration:
# grub-install --target=i386-pc /dev/sdc --recheck
# grub-install --target=x86_64-efi --efi-directory=/boot/efi --removable --recheck
Or(uefi mode):
grub-install --target=x86_64-efi --efi-directory=/boot --bootloader-id=GRUB
grub-mkconfig -o /boot/grub/grub.cfg
Support generic gpu:
# pacman -S xf86-video-vesa xf86-video-ati xf86-video-intel xf86-video-amdgpu xf86-video-nouveau xf86-video-fbdev
Network configuration:
# pacman -S networkmanager
# systemctl enable NetworkManager
# grub-mkconfig -o /boot/grub/grub.cfg
Now you could use usb disk for booting up the system, enjoy it.
libvirt configuration
Install iptables, etc.
# pacman -S ebtables iptables dnsmasq
Configure bridge networking using network manager:
$ nmcli connection add type bridge ifname br0 stp no
$ nmcli connection add type bridge-slave ifname enp30s0 master br0
Case static ip address:
nmcli conn add type bridge ifname br0 ipv4.method manual ipv4.address "10.137.149.5/24" ipv4.gateway "10.137.149.1" ipv4.dns 223.5.5.5
nmcli connection add type bridge-slave ifname eth0 master br0
Case dhcp(notice the bridge name):
nmcli connection modify bridge-br0 ipv4.method auto
Change mtu to 9000:
# nmcli connection modify bridge-slave-eth0 802-3-ethernet.mtu 9000
# nmcli connection show bridge-slave-eth0 | grep mtu
802-3-ethernet.mtu: 9000
iptables for libvirt:
# iptables -I FORWARD -m physdev --physdev-is-bridged -j ACCEPT
# iptables-save -f /etc/iptables/iptables.rules
# systemctl enable iptables.service
Then your bridge could be use.
Dec 29, 2020
Technology现场安装时,因为某些不可控的原因,可能无法安装定制化操作系统,此时可使用以下步骤,从最小化安装的Ubuntu18.04 转换为RONG节点:
以下操作以Ubuntu18.04.5
为例说明,默认操作用户为安装时创建的用户kkk
, 现场需要根据情况灵活调整。
- 上传ISO到机器:
# scp ./ubuntu-18.04.5-server-amd64-auto-xfs.iso kkk@192.168.122.32:/home/kkk
kkk@192.168.122.32's password:
- 在机器上挂载iso:
kkk@ubuntu:~$ sudo mount ubuntu-18.04.5-server-amd64-auto-xfs.iso /media/cdrom
[sudo] password for kkk:
mount: /mnt: WARNING: device write-protected, mounted read-only.
- 使用iso作为本地安装源:
# rm -f /etc/apt/sources.list
# apt-cdrom -m -d=/media/cdrom add
# cat /etc/apt/sources.list
deb cdrom:[Ubuntu-Server 18.04.5 LTS _Bionic Beaver_ - Release amd64 (20200810)]/ bionic main restricted
- 此时
apt-get
更新源并安装对应的包:
# apt-get update
# apt-get install nfs-common openssh-server update-motd parted build-essential telnet tcpdump python
安装完毕后程序会自动umount /media/cdrom
下挂载的ISO, 如果提示需要重新mount /media/cdrom
的时候,则在另一终端重新mount iso至/media/cdrom
下则可。
- 注入root免登录密钥
$ sudo su
# ssh-keygen
一路按回车,创建公钥私钥
# vim /root/.ssh/authorized_keys
粘贴以下内容, 此内容在rong ISO的preseed/auto.seed中可以找到, 开头为"ssh-rsa", 结尾为"DashSSD"标识.
ssh-rsa owaugowugouwoguwougowuoguwougouwogwe例子例子例子例子例子例子**************= dash@DashSSD
- 此时可以进行RONG的正常部署, 不一定需要使用test用户登录。
Dec 11, 2020
TechnologyHardware & OS
Hardware configuration:
# lscpu
Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
32Core
# free -g
total used free shared buff/cache available
Mem: 62 19 10 0 33 42
Swap: 0 0 0
# df -h
/dev/mapper/vg-root 1.7T 1.1T 538G 66% /
OS Configuration:
# cat /etc/issue
Ubuntu 16.04.4 LTS \n \l
AIM
To use this server as the vagrant environment.
vagrant-libvirt
use docker for running vagrant:
# docker pull vagrantlibvirt/vagrant-libvirt:latest
Install libvirtd related:
# apt-get install -y virt-manager
# systemctl status libvirt-bin qemu
Desktop
Use awesome as the default desktop:
# apt-get install -y i3
# /usr/lib/apt/apt-helper download-file https://debian.sur5r.net/i3/pool/main/s/sur5r-keyring/sur5r-keyring_2020.02.03_all.deb keyring.deb SHA256:c5dd35231930e3c8d6a9d9539c846023fe1a08e4b073ef0d2833acd815d80d48
# dpkg -i ./keyring.deb
# echo "deb http://debian.sur5r.net/i3/ $(grep '^DISTRIB_CODENAME=' /etc/lsb-release | cut -f2 -d=) universe" >> /etc/apt/sources.list.d/sur5r-i3.list
# apt-get update -y
# apt install i3
# apt-get install -y tigervncserver
# vncpasswd
# vncserver -localhost -nolisten tcp
# vim ~/.vnc/xstartup
#!/bin/bash
i3 &
Change to lxde4:
cat ~/.vnc/xstartup
#/etc/X11/Xsession
exec startlxde
client
Enable the ssh transfering:
$ ssh -p 62022 -L 127.0.0.1:5901:localhost:5901 root@xxx.xxx.xxx.xxx
then viewer localhost:5901
you could see the desktop
Dec 3, 2020
TechnologyKubernetes v1.15.3
默认已开启100年签名,但节点未开启签名的自动更新,通过以下方法开启:
# cat rr.yml
---
- hosts: k8s-cluster
gather_facts: false
tasks:
- name: "Change kubelet configuration for adding certificates rotate"
raw: sed -i.$(date "+%m%d%y") '/^clusterDNS:/i rotateCertificates:\ true' /etc/kubernetes/kubelet-config.yaml
- hosts: k8s-cluster
gather_facts: false
tasks:
- name: "Restart kubelet"
raw: systemctl restart kubelet
# ansible-playbook -i inventory/rong/hosts.ini rr.yml
如果已经过期, 则通过rejoin的方式重新加工作节点即可。
Kubernetes v1.17.6
Download source file from github:
# wget https://github.com/kubernetes/kubernetes/archive/v1.17.6.zip
# unzip v1.17.6.zip
# cd kubernetes-1.17.6
# vim cmd/kubeadm/app/constants/constants.go
CertificateValidity = time.Hour * 24 * 365 * 100
# vim vendor/k8s.io/client-go/util/cert/cert.go
func NewSelfSignedCACert
NotAfter: now.Add(duration365d * 100).UTC(),
func GenerateSelfSignedCertKeyWithFixtures
maxAge := 100 * time.Hour * 24 * 365
Edit building:
# vim hack/make-rules/cross.sh
make all WHAT="${KUBE_SERVER_TARGETS[*]}" KUBE_BUILD_PLATFORMS="${KUBE_SERVER_PLATFORMS[*]}"
#make all WHAT="${KUBE_NODE_TARGETS[*]}" KUBE_BUILD_PLATFORMS="${KUBE_NODE_PLATFORMS[*]}"
#
#make all WHAT="${KUBE_CLIENT_TARGETS[*]}" KUBE_BUILD_PLATFORMS="${KUBE_CLIENT_PLATFORMS[*]}"
#
#make all WHAT="${KUBE_TEST_TARGETS[*]}" KUBE_BUILD_PLATFORMS="${KUBE_TEST_PLATFORMS[*]}"
#
#make all WHAT="${KUBE_TEST_SERVER_TARGETS[*]}" KUBE_BUILD_PLATFORMS="${KUBE_TEST_SERVER_PLATFORMS[*]}"
# vim hack/lib/golang.sh
readonly KUBE_SUPPORTED_SERVER_PLATFORMS=(
# linux/amd64
# linux/arm
linux/arm64
# linux/s390x
# linux/ppc64le
)
//.............
kube::golang::server_targets() {
local targets=(
# cmd/kube-proxy
# cmd/kube-apiserver
# cmd/kube-controller-manager
# cmd/kubelet
cmd/kubeadm
# cmd/kube-scheduler
# vendor/k8s.io/apiextensions-apiserver
# cluster/gce/gci/mounter
)
Build:
# make cross
Get the build result:
# ls ./_output/local/go/bin/linux_arm64/kubeadm
# file ./_output/local/go/bin/linux_arm64/kubeadm
./_output/local/go/bin/linux_arm64/kubeadm: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, stripped
Kubernetes v1.18.8(arm64)
The same as in v1.17.6
Update(V1.17.6)
默认情况下集群状态(365天过期):
root@wuhanarm64-1:/home/test# kubeadm alpha certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
W1203 11:44:01.023274 43160 defaults.go:186] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [110.192.0.10]; the provided value is: [110.192.0.3]
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Sep 11, 2021 06:45 UTC 282d no
apiserver Sep 11, 2021 06:43 UTC 282d ca no
apiserver-kubelet-client Sep 11, 2021 06:43 UTC 282d ca no
controller-manager.conf Sep 11, 2021 06:45 UTC 282d no
front-proxy-client Sep 11, 2021 06:43 UTC 282d front-proxy-ca no
scheduler.conf Sep 11, 2021 06:45 UTC 282d no
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Sep 09, 2030 06:43 UTC 9y no
front-proxy-ca Sep 09, 2030 06:43 UTC 9y no
root@wuhanarm64-1:/home/test# kubectl get nodes
NAME STATUS ROLES AGE VERSION
wuhanarm64-1 Ready master 82d v1.17.6
wuhanarm64-2 Ready <none> 82d v1.17.6
wuhanarm64-3 Ready <none> 82d v1.17.6
更新kubeadm:
# mv /usr/local/bin/kubeadm /usr/local/bin/kubeadm.back
# scp gowuegowoguweog:gowugouwoeogo/kubeadm_1.17.6_arm64 /usr/local/bin/kubeadm
# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.6", GitCommit:"d32e40e20d167e103faf894261614c5b45c44198", GitTreeState:"archive", BuildDate:"2020-12-03T02:53:08Z", GoVersion:"go1.14.2", Compiler:"gc", Platform:"linux/arm64"}
用新生成的kubeadm重新renew
签名:
# kubeadm alpha certs renew all --config=/etc/kubernetes/kubeadm-config.yaml
W1203 11:48:50.884660 45705 defaults.go:186] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [110.192.0.10]; the provided value is: [110.192.0.3]
W1203 11:48:50.885133 45705 validation.go:28] Cannot validate kube-proxy config - no validator is available
W1203 11:48:50.885156 45705 validation.go:28] Cannot validate kubelet config - no validator is available
certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed
# kubeadm alpha certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
W1203 11:49:55.769736 46233 defaults.go:186] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [110.192.0.10]; the provided value is: [110.192.0.3]
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Nov 09, 2120 03:48 UTC 99y no
apiserver Nov 09, 2120 03:48 UTC 99y ca no
apiserver-kubelet-client Nov 09, 2120 03:48 UTC 99y ca no
controller-manager.conf Nov 09, 2120 03:48 UTC 99y no
front-proxy-client Nov 09, 2120 03:48 UTC 99y front-proxy-ca no
scheduler.conf Nov 09, 2120 03:48 UTC 99y no
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Sep 09, 2030 06:43 UTC 9y no
front-proxy-ca Sep 09, 2030 06:43 UTC 9y no
更新kubeconfig:
# kubeadm init phase kubeconfig all --config kubeadm-config.yaml
W1203 11:55:20.868908 49000 defaults.go:186] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [110.192.0.10]; the provided value is: [110.192.0.3]
W1203 11:55:20.869702 49000 validation.go:28] Cannot validate kube-proxy config - no validator is available
W1203 11:55:20.869730 49000 validation.go:28] Cannot validate kubelet config - no validator is available
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/admin.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/scheduler.conf"
# mv $HOME/.kube/config $HOME/.kube/config.old
# cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
# chown $(id -u):$(id -g) $HOME/.kube/config
重启 kube-apiserver、kube-controller、kube-scheduler、etcd 这4个容器, 举kube-apiserver
为例:
# docker ps | grep kube-api
d086ef6a4b9b 6388c24eab51 "kube-apiserver --ad…" 2 hours ago Up 2 hours k8s_kube-apiserver_kube-apiserver-wuhanarm64-1_kube-system_40f4f19c75fe95ea92ebe00b7bc1576e_625
486a3c2d3e9e k8s.gcr.io/pause:3.1 "/pause" 2 hours ago Up 2 hours k8s_POD_kube-apiserver-wuhanarm64-1_kube-system_40f4f19c75fe95ea92ebe00b7bc1576e_3
# docker rm -f d086ef6a4b9b
d086ef6a4b9b
检查apiserver证书:
# echo | openssl s_client -showcerts -connect 127.0.0.1:6443 -servername api 2>/dev/null | openssl x509 -noout -enddate
notAfter=Nov 9 03:48:52 2120 GMT
1.17.6中,因kubelet默认开启了rotateCertificates
模式,各节点证书在一年后应该会自动更新。
1.17.6(已过期)
如果已经过期的话,如何处理?
首先将更改了100年签名的kubeadm
拷贝到kube-master[0]节点上,替换掉默认的kubeadm
:
# mv /usr/local/bin/kubeadm /usr/local/bin/kubeadm.back
# scp gowuegowoguweog:gowugouwoeogo/kubeadm_1.17.6_arm64 /usr/local/bin/kubeadm
使用以下命令查看签名的时间:
# cd /etc/kubernetes/ssl
# for i in `ls *.crt`; do openssl x509 -in $i -noout -dates; done | grep notAfte
notAfter=Oct 31 06:11:04 2020 GMT
notAfter=Oct 31 06:11:03 2020 GMT
notAfter=Oct 31 06:11:03 2020 GMT
notAfter=Oct 31 06:11:04 2020 GMT
notAfter=Oct 31 06:11:05 2020 GMT
在kube-master[0]节点上,手动设置时间为过期前的时间,如2020年9月1日:
# date -s 20200901
# hwclock -w
在kube-master[0]节点上的Rong
安装目录里,通过ssh同步所有节点(举例为10.137.149.231~233
)时间:
# ssh -o "StrictHostKeyChecking=no" -i .rong/deploy.key root@10.137.149.231 date -s @`( date -u +"%s" )`
# ssh -o "StrictHostKeyChecking=no" -i .rong/deploy.key root@10.137.149.232 date -s @`( date -u +"%s" )`
# ssh -o "StrictHostKeyChecking=no" -i .rong/deploy.key root@10.137.149.233 date -s @`( date -u +"%s" )`
# ssh -o "StrictHostKeyChecking=no" -i .rong/deploy.key root@10.137.149.231 hwclock -w
# ssh -o "StrictHostKeyChecking=no" -i .rong/deploy.key root@10.137.149.232 hwclock -w
# ssh -o "StrictHostKeyChecking=no" -i .rong/deploy.key root@10.137.149.233 hwclock -w
更改完时间后,kubectl get node
等命令可用, 此时使用以下命令更新签名:
# kubeadm alpha certs renew all=kubeadm --config=kubeadm-config.yaml
因签名已变化,替换掉当前使用的.kube/config
文件:
# mv $HOME/.kube/config $HOME/.kube/config.old
# cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
使用以下命令查看更新后的签名(100年以后过期):
# kubeadm alpha certs check-expiration
检查各节点是否为Ready
状态,如果未ready,可登录至该节点, 通过systemctl restart kubelet
命令重启控制平面同步签名后即可变成ready状态:
# kubectl get node
调整回正确的时间(当前时间):
# date -s 20201203
# date -s 14:20:20
# hwclock -w
# ssh -o "StrictHostKeyChecking=no" -i .rong/deploy.key root@10.137.149.231 date -s @`( date -u +"%s" )`
# ssh -o "StrictHostKeyChecking=no" -i .rong/deploy.key root@10.137.149.232 date -s @`( date -u +"%s" )`
# ssh -o "StrictHostKeyChecking=no" -i .rong/deploy.key root@10.137.149.233 date -s @`( date -u +"%s" )`
# ssh -o "StrictHostKeyChecking=no" -i .rong/deploy.key root@10.137.149.231 hwclock -w
# ssh -o "StrictHostKeyChecking=no" -i .rong/deploy.key root@10.137.149.232 hwclock -w
# ssh -o "StrictHostKeyChecking=no" -i .rong/deploy.key root@10.137.149.233 hwclock -w
调整完时间后,再次检查签名是否正常, 各工作节点是否ready:
# kubeadm alpha certs check-expiration
# kubectl get node
v1.18.8(未过期)
将kubeadm(100年修改版)上传到master机器上,renew签名即可。具体步骤与v1.17.6相同。
v1.18.8(已过期)
具体步骤与v1.17.6(已过期)相同。
Nov 30, 2020
Technology现象
Not Ready:
# kubectl get nodes
ai05 NotReady, SchedulingDisabled node 436d v1.13.5
对策
去掉SchedulingDisabled
:
# kubectl uncordon ai05
NotReady
的解决方法是:
cd /etc/nginx/
mv nginx.conf nginx.conf_sb
某SB改动了此节点上的nginx配置文件,导致该节点无法与正确的api server通信。
更改为正确的nginx.conf
配置:
stream {
upstream kube_apiserver {
least_conn;
server 192.192.185.97:6443;
}
server {
listen 127.0.0.1:6443;
....
}
之前是被SB更改为本机的8021端口到8020端口的映射。
重新启动该节点的kubelet, 但是此时无法正常启动,则:
1. 本机从127.0.0.1:6443切换为192.192.185.97:6443, 通过更改/etc/kubernetes/kubelet.conf。
2. 本机的kube-proxy从127.0.0.1:6443切换为192.192.185.97:6443, 通过更改configmap
3. 删除本机错误的calico。
4. calico提示/run/systemd/resolve/resolv.conf无法找到, 手动创建链接文件。
5. 删除calico/kube-proxy等pod,使之自动创建。
6. 现在nginx-proxy被重新创建,现在开始切换回127.0.0.1:6443
7. 切换回后,删除calico/kube-proxy/nginx-proxy等 pod
8. 现在一切应该正常。
SB的一个误操作,一两个小时就没有了,代价沉重。