Fixing booting issues with LVM
Earlier this yaer, I bought a Firebat N100 pc to add to my proxmox cluster. The device itself is great. Apart from when updating, it sometimes reboots. Recently, this reboot was while the kernel was being updated.
The initial problem presented itself as "ZSTD-compressed data is corrupt". But great when the only thing that it was looking at was the init part of the boot sequence. I tried other installed kernels, but no joy. Some would just hang, none would boot. I tried disabling secure boot as well as trying to load single user mode.
This shouldn't be a problem, just boot off my network and use a live CD to recover? Well, no. I place my NICs in a bonded team and for some reason, network booting didn't seem to want to know. I did try unplugging NICs. I didn't, however mess around with unifi to unbundle the NICs. In the end, I dug out my usb boot drive I used for installing proxmox.
Search around on the drive itself, I found I actually had to run the installer, then when prompted for acceptance on the text-based installer, I hit ctrl+F3 to get a prompt. So far so good. Despite the keyboard layout not being in english, was able to start typing commands. For reference, y and z are swapped, - is where the forward slash normally is and the forward slash is shift + 7.
The first thing I did was run
fsck -fy /dev/sda1
fsck -fy /dev/sda2
fsck -fy /dev/sda3
Your drive may appear as something different, so change the above accordingly. Only one of the commands appeared to fix things, but not enough as a reboot showed.
So, for the next round of fixes, I realised I had to mount the partitions and run some fixes on the drives. The first step was to locate the actual drive via lvm. This was done by running the command
lsblk
You are looking for the name which has -root as part of it. The next stage is to run
lvscan
This should tell you the name of the partition - so we can fsck it, and if the lvm is active. Mine was, but if yours isn't, run the command
lvchange -ay /dev/proxmox-vg/root
where /dev/proxmox-vg/root is the path from the lvscan command.
Once the partition is active, you can now fsck it, with the command
fsck -fy /dev/proxmox-vg/root
If this fixes your problem, happy days. It didn't, however, fix mine. After a bit of googling, the main way to fix my error was to run "initramfs -u" on the booted system. So, the next thing I needed to do was boot the system or at least, chroot to it.
The chroot to the proxmox system was done by issuing the following commands
mkdir -p /mnt
mount /dev/proxmox-vg/root /mnt
mount --bind /sys /mnt/sys
mount --bind /proc /mnt/proc
mount --bind /dev /mnt/dev
chroot /mnt
Once complete, I issued the command
sudo update-initramfs -u
For me this complained about /boot not being mounted. While I didn't actually mount /boot in the end, I did issue the command
sudo dpkg --configure -a
This command completed the upgrade and with a quick reboot, the N100 was back and part of the cluster.