TroubleShootingOnPromoxIssue

Problem

2 months ago I set a proxmox environment for a team which used as a dev environment, for quickly snapshot and migration I choose zfs for the filesystem, the structure is listed as following:

/images/2019_05_30_16_54_05_656x311.jpg

But this environment got very slow i/o speed, I find the reason is because zfs shouldn’t rely on raid card.

Which worse is: today after reboot, the grub runs into grub rescue:

/images/2019_05_30_16_55_53_385x62.jpg

but you could not reached the /boot folder:

/images/2019_05_30_16_56_21_319x73.jpg

Solution

Raid Card configuration, change 01:22 and 01:23 from hotspare to raid1:

/images/2019_05_30_16_59_58_484x180.jpg

to:

/images/2019_05_30_17_00_11_495x152.jpg

Create new VD:

/images/2019_05_30_17_00_41_561x278.jpg

Select these 2 disks:

/images/2019_05_30_17_00_55_686x337.jpg

Select boot device(notice Boot device):

/images/2019_05_30_17_01_45_703x397.jpg

Now Reinstall proxmox to VD 5.

System Configuration

Remove the data lv:

# lvremove /dev/mapper/data

Create a new lv via:

# lvcreate -n lv_root -L 150G pve
# mkfs.ext4 /dev/mapper/pve-lv_root

Now mount the zfs pools via:

# zpool import -a
# zpool status
# zfs create -o canmount=noauto =o mountpoint=/mnt rpool/pve....
# zfs mount rpool/pve.....

Create new mount point and copy the old system into new:

# mkdir /mnt1
# mount /dev/mapper/pve-lv_root /mnt1
# cp -arp /mnt/* /mnt1/

Now you have to change the grub.cfg:

# vim /boot/grub/grub.cfg
if [ "${next_entry}" ] ; then
   set default="${next_entry}"
   set next_entry=
   save_env next_entry
   set boot_once=true
else
   set default="2"
fi



menuentry "Our Proxmox Boot Recovery" {
	load_video
	insmod gzio
	if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi
	insmod part_gpt
	insmod lvm
	insmod ext2
	set root='lvmid/ySIDEN-2G0X-DU6A-p0q8-cXsW-o6ja-IyhXuc/0M8yUI-yNQJ-Ntx8-8cfE-3g9k-0sOb-SDWfQe'
	if [ x$feature_platform_search_hint = xy ]; then
	  search --no-floppy --fs-uuid --set=root --hint='lvmid/ySIDEN-2G0X-DU6A-p0q8-cXsW-o6ja-IyhXuc/0M8yUI-yNQJ-Ntx8-8cfE-3g9k-0sOb-SDWfQe'  4fb86e38-eeae-
489a-b45e-3e5cc8055654
	else
	  search --no-floppy --fs-uuid --set=root 4fb86e38-eeae-489a-b45e-3e5cc8055654
	fi
	echo	'Loading Linux 4.15.17-1-pve ...'
	linux	/boot/vmlinuz-4.15.17-1-pve root=/dev/mapper/pve-lv_root ro  quiet
	echo	'Loading initial ramdisk ...'
	initrd	/boot/initrd.img-4.15.17-1-pve
}
menuentry "Memory test (memtest86+)" {
	insmod part_gpt

Edit the fstab:

$ vim /mnt1/etc/fstab 
# <file system> <mount point> <type> <options> <dump> <pass>
/dev/pve/lv_root / ext4 errors=remount-ro 0 1
/dev/pve/swap none swap sw 0 0
proc /proc proc defaults 0 0

Now reboot the system you will get into the newly-created lvm based system, and run an ext4 based OS which acts the same as the old ones.

LinuxUSBEthernetBonding

Reason

Previously, 100M->1000M,
After upgrading: 100M+100M -> 1000M /images/2019_05_29_14_40_06_638x528.jpg

Hardware

TL-SG108E Version 1.0:

/images/2019_05_29_14_31_23_600x357.jpg

Install Unmanaged pro, and use it for accesing TL-SG108E, we need to configure LAG on switch(LAG1, 1/2, LAG2, 5,6):

/images/2019_05_29_14_49_01_863x529.jpg

Laptop1 network linking:

/images/2019_05_29_14_52_55_860x636.jpg

Powersync(100M/s) + D-Link(100M/s), all attaches to an usb hub, then connecting to the laptop.

USB Ethernet Rename

Following configuration should be written:

# pwd
/etc/systemd/network

# cat 10-ethusb1.link 
[Match]
MACAddress=xxxxxxxxxxxxxx

[Link]
Description=USB to Ethernet Adapter
Name=ethusb1
# cat 10-ethusb1.network 
[Match]
Name=ethusb1

[Network]
Address=192.168.0.33
# cat 30-ethusb2.link 
[Match]
MACAddress=8xxxxxxxxxxxxxx

[Link]
Description=USB to Ethernet Adapter 2
Name=ethusb2
# cat 30-ethusb2.network 
[Match]
Name=ethusb2

[Network]
Address=xxxxxxxxx

Reboot to view the configuration and examine its result via ifconfig ethusb1 and ifconfig ethusb2.

Bonding

Configure bond0:

# pwd
/etc/systemd/network
# cat bond1.network 
[Match]
Name=bond1

[Network]
BindCarrier=ethusb1 ethusb2
# cat bond1.netdev 
[NetDev]
Name=bond1
Kind=bond

[Bond]
Mode=balance-rr
# cat Management.network 
[Match]
Name=bond1

[Network]
Address=192.168.0.33/24

Now you could see bond has been configured, and the transfer speed could up to 20M/s

TipsOnBuyVM

快速搭梯

步骤:

# hostnamectl set-hostname xxxvps
# apt-get update -y
# apt-get install -y curl byobu vim
# bash <(curl -s -L https://git.io/v2ray.sh)

/images/2019_05_29_10_44_52_373x297.jpg

/images/2019_05_29_10_45_41_502x283.jpg

选择自用端口/ss 端口/密码/加密协议等

/images/2019_05_29_10_49_20_427x323.jpg

MigratePWDtoUbuntu

Steps

Install Ubuntu 18.04.02(Using Rong iso).
Configure IP/hostname.
Install docker-ce, docker-compose.
Load all of the docker images.
Change the dnsmasq.
Enable the service(playwithdocker/playwithdockerblog).
Install golang and re-run go(the previously go directory could be re-use)
Run docker swarm init before you really run the playwithdocker.
Add items into dnsmasq (192.192.189.115/192.192.189.115) Modification on code:

$ playwithdockerblog, 
192.192.189.114->115
$ playwithdocker(/root/go/src/github.com/playwithdocker/playwithdocker/xxx.go
114->115

dnsmasq

For conclicting with libvirt’s dnsmasq, do following steps:

# vim /etc/dnsmasq.conf
listen-address=192.192.189.127
bind-interfaces
# systemctl restart dnsmasq

Thus we would remove the dnsmasq funcitonality from the exisiting playwithdocker nodes(previously we run dnsmasq on 192.192.189.114 rather than in 192.192.189.127.

Result

Now using browser for viewing 192.192.189.115, then you could see a new running playwithdocker blog.

TipsOnAIMachine

硬件环境

新购的神舟, Z7-KP7GH, CPU, i7-8750H, 内存24G, 显卡Nvidia GTX1060 6G.
8G 优盘用于系统安装.

软件安装及适配

ubuntu-18.04.2-desktop-amd64.iso, 写入优盘:

# sudo dd if=./ubuntu-18.04.2-desktop-amd64.iso of=/dev/sdc bs=1M && sudo sync

笔记本开机按DEL进入BIOS配置,选择U盘启动,遇到安装卡住的问题,解决方案如下:

GRUB choose the Ubuntu, or Install Ubuntu (it depends, you will see it hopefully), go to it with the arrows and press the 'e' key.
Here go to the line which contains quiet splash at the end and add  acpi=off after these words.
Then press F10 to boot with these settings.

安装中需要重新分区, 参考:

/images/2019_05_24_10_41_45_771x380.jpg 这里新建了efi分区,并使用新建的分区用于安装操作系统,同时保留了原有的Windows操作系统,特别要注意的是关于bootloader的安装位置。

安装完毕后,由于是nvidia卡的原因,首次进入系统会卡住,这里我们需要再次修改GRUB进入系统:

When you are in the GRUB menu, press E to enter the GRUB editor. Add nouveau.modeset=0 to the end of the line that starts with linux. After you've added it, press F10 to boot. Your system should start. After that, go to System Settings > Software & Updates > Additional Drivers and then select the NVIDIA driver. Right now I'm using NVIDIA binary driver- version 367.57 from nvidia-367 (proprietary, tested).

当前(2019-05-24)时,nvidia的驱动是nvida-driver-390.

现在重新启动机器,就可以正常进入系统并执行操作了。
显卡的测试可以参考https://linuxconfig.org/benchmark-your-graphics-card-on-linux 时间的关系这里我就不做了。

系统适配

安装必要的包:

# apt-get install -y openssh-server vim net-tools virt-manager vagrant
vagrant-libvirt meld lm_sensors

Install cuda:

# systemctl stop gdm
# ./cuda_10.0.130_410.48_linux.run
# vim ~/.bashrc
export PATH=/usr/local/cuda-10.0/bin:$PATH
# source ~/.bashrc
# nvidia-smi 
Mon May 27 08:40:56 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1060    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   55C    P0    25W /  N/A |      0MiB /  6078MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
# wget xxxxxxxxhttps://developer.nvidia.com/rdp/cudnn-archive
# Get the following packages: cudnn-10.0-linux-x64-v7.5.0.56.tgz
#  tar -zxvf cudnn-10.0-linux-x64-v7.4.2.24.tgz
# 拷贝
$ cd cudnn-10.0-linux-x64-v7.4.2.24
$ sudo cp cuda/include/cudnn.h /usr/local/cuda-10.0/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda-10.0/lib64
# 修改权限
$ sudo chmod a+r /usr/local/cuda-10.0/include/cudnn.h /usr/local/cuda-10.0/lib64/libcudnn*
$ vim ~/.bashrc
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64:$LD_LIBRARY_PATH
export CUDNN_PATH="/usr/local/cuda-10.0/lib64/libcudnn.so"
$ source ~/.bashrc
$ echo -e '#include"cudnn.h"\n void main(){}' | nvcc -x c - -o /dev/null -lcudnn
$ echo $?
0

Now upgrading your nvidia driver:

$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt update
$ ubuntu-drivers devices
$ sudo ubuntu-drivers autoinstall 
$ sudo reboot
After reboot....
$ nvidia-smi 
Mon May 27 09:07:16 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.14       Driver Version: 430.14       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1060    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   49C    P0    26W /  N/A |    166MiB /  6078MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1970      G   /usr/lib/xorg/Xorg                            94MiB |
|    0      2148      G   /usr/bin/gnome-shell                          69MiB |
+-----------------------------------------------------------------------------+

Now your cuda and cudnn is installed OK. Cause nvidia’s cuda will be older than the ppa’s and will cause problems, we need to install driver after cuda installation.

tensorflow

Install pip and use pip for installing tensorflow:

$ sudo apt-get install -y python-pip
$ pip install tensorflow-gpu
$ vim test.py
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
$ python test.py
2019-05-27 09:35:27.847206: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-05-27 09:35:27.952455: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-05-27 09:35:27.953302: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5643333f0dc0 executing computations on platform CUDA. Devices:
2019-05-27 09:35:27.953344: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce GTX 1060, Compute Capability 6.1
2019-05-27 09:35:27.974107: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2208000000 Hz
2019-05-27 09:35:27.975517: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x564333ab33b0 executing computations on platform Host. Devices:
2019-05-27 09:35:27.975563: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-05-27 09:35:27.977344: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:01:00.0
totalMemory: 5.94GiB freeMemory: 5.68GiB
2019-05-27 09:35:27.977382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-05-27 09:35:27.979140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-27 09:35:27.979179: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-05-27 09:35:27.979193: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-05-27 09:35:27.979313: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5517 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1)
Hello, TensorFlow!

remote machine

Settings-> Sharing-> Screen Sharing:

/images/2019_05_27_10_23_29_364x494.jpg

then setting:

$ gsettings set org.gnome.Vino require-encryption false

Now use vncviewer for viewing the 5900 port, you will get the remote screen.