My machine running Fedora 30 was upgraded to version 31 on March 21st. When rebooting I got no grub menu and a fatal error. This post is about what I have done to resolve this issue.
How to make changes to your system when you can’t boot into it
You need a live system! Or maybe there are better ways but so far this is the best I know of. I also heard about rescue mode but people seem to have to wait at that error screen for quite a long time before the mode activates, so a reset might be faster.
From the live system, you can use chroot command to basically operate as if your root is in your unbootable system, so that you can fix its problems. Do these things in your terminal:
From the chroot-ed system I could check log with journalctl and saw an error message “grubby fatal error: unable to find a suitable template”. I also checked grub.cfg in both /boot/grub2/ and /boot/efi/EFI/fedora/, since the problem might be with grub, and saw in both a huge amount of the root partition’s UUID repeated in the default kernel option:
And this is just a shorter version, with the root partition’s UUID appearing 179 times, recreated after I had figured out the issue. The real one I saw had that UUID repeated for 21949 times. So this was the reason for the “token too large, exceeds YYLMAX” error. But why? And what’s up with a template for grubby? Anyway, grub2-mkconfig can help remaking the grub config file, so I moved on.
On March 29th and April 3rd, I ran kernel updates, both of which ended up again with the error message about the grubby’s template, and a process being killed:
/sbin/new-kernel-pkg: line 321: 170010 Killed $grubby --grub2 -c $grub2Config --remove-kernel=$kernelImage
The grub config files again contained the same amount of UUID strings. So grubby might have been killed because of this.
Should I continue running grub2-mkconfig everytime I update the kernel? I don’t want to have error logs either, so no. At this point it’s all about grubby: the new-kernel-pkg script called grubby to change the grub config file and then got a bad file, grubby also wanted a template which seemed to not exist. So let’s read grubby code to see what it does.
From Fedora 30, BootLoaderSpec (BLS)-style config files have been made the default for configuring the bootloader’s menu entries. I don’t actually know when it came to my machine, just remember that at some point when running grub2-mkconfig I didn’t see Fedora entries anymore, only the Windows one. And since after using grub2-mkconfig to recreate grub.cfg I could boot normally, it’s safe to assume that what it creates is good with grub and BLS:
The issue is that grubby considers the equal sign = a kind of separator between the name and the value of a key, and it doesn’t know about the BLS-style that config line is using, so when parsing the config file, it replicates every right hand side of each equal sign. After it runs once, the number of times the root partition’s UUID appears in that line increases from 1 to 3, twice, to 14, three times, to 179, and four times, to 21949.
So how did grubby get called after each kernel update?
There are 2 macros in kernel.spec file, which run the kernel-install script: kernel_variant_posttrans runs the add command and kernel_variant_preun runs the remove command. These in turn trigger the corresponding add and remove parts of scripts in both /usr/lib/kernel/install.d/ and /etc/kernel/install.d/.
In /usr/lib/kernel/install.d/ I have 20-grubby.install which a bit surprisingly comes from the package systemd-udev and runs new-kernel-pkg to install a new bootloader’s menu entry, add a new kernel image, update the rescue image, and remove the old kernel, all using grubby. The command in the error message $grubby --grub2 -c $grub2Config --remove-kernel=$kernelImage can also be found in this new-kernel-pkg script.
There is also the 20-grub.install script in that folder, which comes from grub2-common package and also uses new-kernel-pkg to do those tasks, but only when that script exists AND grub doesn’t have BLS enabled (GRUB_ENABLE_BLSCFG != true). This config can be found in /etc/default/grub and I have it set to true.
The question now is: why is new-kernel-pkg there? If it’s not there, grubby won’t be called, no error.
And surprise surprise, which package contains that script? grubby-deprecated.
When did I install it? March 19th.
Why did I do that? I don’t remember.
What should I do now? Remove grubby-deprecated, I guess.
And right now while I’m writing this post, a new kernel update is available. Let’s see how it goes…
While waiting for the update to finish, let’s talk about the grubby’s template. It will be resolved anyway when we get rid of new-kernel-pkg, but just for the sake of completeness: grubby tries to find a menu entry and to use it as a default template to create another one, but it can’t, since with this BLS-styled grub, configs for menu entries are stored in separated files, not in the grub config file anymore.