Recently, I built a new computer to replace the one I built 5 years ago. The reference model XFX Radeon RX 6900 XT in that rig is one of the best GPUs I’ve ever owned: sleek design, great performance, and rock solid stability from day one.

Earlier this year, AMD released their 9070 series. Since I skipped the previous generation, it seemed like this was the time for me to get a new card. Not so:

Well, it looks like I’m not getting a 9070 XT.

The good news: I anticipated this, and purchased a 7900 XTX for about $800 earlier this year.

The bad news: It worked the second I plugged it in, except… oof.

Oh hey, it’s a regression in the Linux kernel. Time to bisect!

What’s bisecting and why should I care?

If you’re certain that a bug in software didn’t exist until an update introduced it 1, and you’ve got the source code and a way to build it, you can bisect using git to find the commit that caused it.

git bisect uses a binary search algorithm to find the offending commit. Just like I learned in college 2, you sort your array and test the middle value. Depending on your result, you can discard one half of the array, and do it all again.

How to bisect the kernel on Arch, the easy way

In order to bisect, you need a way to easily and quickly set up a new build from an arbitrary commit. Simple for small projects, but the scripting required to build and install the kernel is complicated and building takes a long time.

I haven’t found any good guides on this. Yes, there’s an entry on the Arch wiki about bisecting, but again, we’re building the dang kernel.

Luckily, I discovered that the linux-git AUR package actually supports setting custom remotes and commit values! 3 It even takes care of packaging and installing the result.

Here’s the steps that worked for me. I’ll talk about what I learned afterward.

  1. Make a working directory.

  2. In the working directory, git clone these two repos. You’ll end up with two folders: linux and linux-git.

  3. cd linux. We’ll start the bisect in this folder.

  4. In my case, the issue exists between 6.11 and 6.12. I do additionally know that 6.11.8 is good, while 6.12-rc4 is not, but bisecting between those two doesn’t reduce the number of steps needed by a meaningful amount. So we’ll start the bisect with the major version tags:

    $ git bisect start
    $ git bisect good v6.11
    $ git bisect bad v6.12
    Bisecting: 7334 revisions left to test after this (roughly 13 steps)
    [509d2cd12a10d057fdf72f565b930f9a81140d59] Merge tag 'Smack-for-6.12' of https://github.com/cschaufler/smack-next
    
  5. cd ../linux-git and edit the remote file in this directory to specify the commit we want to build:

    REMOTE="stable/linux"
    COMMIT="509d2cd12a10d057fdf72f565b930f9a81140d59"
    
  6. Then, you can simply build: makepkg -si

    • The linux-git PKGBUILD will take care of checking out the kernel at that specific commit.
  7. Once the build is complete, ensure you have an entry in your bootloader to boot into linux-git, and reboot.

  8. Test this kernel commit for the issue you’re experiencing. If the issue is present, go back to the linux directory and run git bisect bad. If it’s not present, run git bisect good. You’ll be presented with another commit to test.

  9. Repeat steps 5-8, but for step 5, now that linux-git is installed, you must edit /etc/linux-git/remote to specify the commit you want to build from.

  10. Once you complete the number of steps required to bisect, you’ll end up with the problematic commit!

What I Learned

Make sure you bisect with a source that actually includes meaningful commits

For some dumb reason, I started really simply and decided “oh, I’ll just clone the arch package for linux and bisect that”. I can hear you shouting: No, you idiot! That’s the repo for Arch Linux’s kernel releases. The only commits there are usually when they tag a new release and change the version that the PKGBUILD grabs and builds. Bisecting that repo would tell me nothing unless the issue was somehow in the way that Arch builds the kernel.

Use a separate bisect and build directory, or use makepkg -e

While Arch’s build system works well for most things, unless you really know your way around makepkg, you might struggle with this type of exercise. Unless you use the -e, --noextract flag to tell makepkg to not re-download source files, it’ll clobber any changes you’ve made… like running git bisect.

Rather than muck around with the PKGBUILD to make sure it was checking out the right commit every time, my solution for this was ultimately to just use a separate directory soley for git bisect. I then used the tools provided by the linux-git package to build to the test commits in another directory.

The kernel source code uses tabs and not spaces

Man, I’m such an idiot.

Testing some patches to fix the issue after I’d bisected had me pulling my hair out. I usually use linux-tkg’s userpatches feature to patch my kernel, but luckily, linux-git also makes it easy to apply patches.

I downloaded a patch, and that one worked, but when I made my own patch – it didn’t apply! Hunk #1 FAILED, and the entire patch is rejected. Why would the entire patch be rejected? The lines match, I’ve manually checked, I even manually typed the patch out and made it myself! Unless…

Oops.

Linux kernel coding style:

Tabs are 8 characters, and thus indentations are also 8 characters.

Oh. OH. Yep, those are spaces in that patch file I made.

Well, that’s easy enough in vim, I suppose:

:set ts=8
:set et
:%retab!

Remove extraneous steps to shorten build time

This is one of the tips from the Arch wiki, actually, though not from the bisecting article. linux-git doesn’t do this, but in case you use a different method or PKGBUILD, you might need to comment out creating the kernel docs. You definitely don’t need them when bisecting.

Building with a ton of threads can hide errors past your scrollback buffer

When I was finally satisfied with the patch, I tossed it into my linux-tkg userpatches folder and move on to, y’know, actually using the computer. One last issue was waiting for me:

  LD [M]  net/qrtr/qrtr-smd.ko
  BTF [M] net/qrtr/qrtr-smd.ko
  LD [M]  net/qrtr/qrtr-tun.ko
  LD [M]  net/qrtr/qrtr-mhi.ko
  BTF [M] net/qrtr/qrtr-tun.ko
  BTF [M] net/qrtr/qrtr-mhi.ko
make: *** [Makefile:251: __sub-make] Error 2
==> ERROR: A failure occurred in build().
    Aborting...
  -> exit cleanup done

Wait, where’s the error? I scrolled up as far as I could go, but… nothing.

It’s impossible to search for this type of generic error in a search engine. Frantic searching netted me the wisdom I needed to unstick my brain:

Highly parallel make may allow a large volume of additional output after an error before the job finally stops. For this reason, we sometimes suggest that users reproduce the error with -j1, so that there is no parallelism and the job stops immediately on error, making the message easy to find.

Of course. I’m using a 9950X3D: That’s 32 threads!

Since there’s no way I’m waiting for a kernel build with -j1 to complete (wtf!), we’ll log it:

$ yes | makepkg -si &> buildlog.txt

After stress-testing my CPU cooler for the 20th time this week, we get all the output.

$ grep Error buildlog.txt --context 10
[...]
arch/x86/tools/insn_decoder_test: error: malformed line 5866816:
2_>:ffffffff81e808b0
make[2]: *** [arch/x86/tools/Makefile:26: posttest] Error 3
make[1]: *** [arch/x86/Makefile:393: bzImage] Error 2
make[1]: *** Waiting for unfinished jobs....

While I’ve no idea why I only started encountering this error while building linux-tkg and not linux-git, at least I had something to search for now.

Sure enough: the kernel bug in question.


Anyway, through all this, I managed to find the regression and make a patch to revert it. Now my cursor works! Head over to the issue to find out what the offending commit was.

That’s one more problem with the new build solved. Too bad my new case that was supposed to ship in 2024 hasn’t arrived yet…


  1. In other words: a regression↩︎

  2. Wow, a real world application of what I learned getting my CS degree! My money wasn’t wasted! ↩︎

  3. This is your reminder to actually inspect PKGBUILDs to see what they do instead of just going, “yup, that’s a PKGBUILD”. ↩︎