June | 2013 | System Saviour

The FSF publishes a document describing guidelines for free software distributions on gnu.org, as well as a list of distributions known to comply with these guidelines. In light of popular distributions that are increasingly including and recommending non-free software, these guidelines and distributions are a breath of fresh air to many – but they too are not without their problems.

From the guidelines, “any nonfree firmware needs to be removed from a free system”. The purpose of such firmware is to allow the target hardware device to function, so essentially distributions like Trisquel GNU/Linux feel it is fine to disable parts of a computer if it cannot be used in a completely free way. I have no complaint about this per se, but the way this is implemented in practice makes these distribution maintainers come off as hypocrites. These distributions are being reduced to not much more than a marketing ploy to mislead users. To understand why, I need to explain a bit more about what is meant exactly by the FSF when they refer to “firmware”, and why in many cases it’s a non-issue.

When the FSF talks about firmware, they are using it in a way that is inclusive of the term “microcode“. This is important, because proprietary microcode is everywhere and difficult to avoid. Even so-called “freedom-compatible” hardware frequently includes it.

If you are running an x86 processor released in the last 10 years or so, your CPU likely supports microcode runtime updates from within the operating system. If you run a Debian Wheezy GNU/Linux distribution, an Intel CPU and have the intel-microcode non-free package installed, this will automatically load the latest proprietary Intel microcode into your CPU at boot (if the packaged version is newer than what is already running).

So what happens if you don’t have this package installed? The answer is that your computer BIOS already includes CPU microcode that it injects into your CPU every time you turn your PC on. This is done before your operating system (or even its bootloader) has started to load. Were you not to load microcode updates in from your operating system, you would need to rely on flashing BIOS updates to deliver your CPU microcode updates. Either way, like it or not, you’re going to run Intel or AMD microcode at boot. It’s just a question of having the latest version with microcode fixes, or running an older version.

Here is the beginnings of why the argument for fully free software distributions (for the x86 architecture at least) falls flat on it’s face. These distributions might be 100% free software, and give you the illusion of having a computer that is fully free, but in practice removing this microcode has achieved very little – if anything at all.

CPUs aren’t the only devices you’ll find in modern PCs that require microcode. Enter the subject of graphics cards. This is where my main gripe with these distributions comes into being. Modern AMD graphics cards, like the CPUs discussed above, require microcode to function properly. Unlike CPUs however, AMD graphics cards need drivers to load this microcode into the GPU at boot – the BIOS will not do this.

AMD has helped the free software community create some great free software drivers. They have released all the specifications, and assisted in the development of code. Nvidia, by comparison, seldom plays ball with free software developers and (for x86-based graphics card drivers at least) has basically been no help at all. If you’re in the market for a high-end graphics card from one of these vendors, AMD would seem the logical choice – support the guys who support free software the most, right? No! Not according to the FSF!

Generators for Nvidia microcode have been created, but not for Radeon microcode. This result is likely just out of necessity – Nouveau (the free software project that has reverse engineered Nvidia graphics card drivers) likely were not able to redistribute the existing proprietary microcode due to licensing. However since AMD has allowed Radeon microcode to be distributed “as is” (~~basically do whatever you want with it~~ [Edit: Sadly I was mistaken – you can basically redistribute as you like but “No reverse engineering, decompilation, or disassembly of this Software is permitted.”], but did not release the means to recreate the (21K or less in size) microcode file, there was little incentive for developers to replace this – they would rather work on actually getting the drivers working properly than dedicating time to what appears to amount to (in this case at least) a purely philosophical exercise.

Now I admit, I don’t like that I need to run my AMD graphics hardware with proprietary microcode (even if they do have excellent free software drivers). Distribution maintainers have two options:

1. Allow the user to install microcode (possibly that the user provides so as to not need to redistribute it as part of the project) to have a working and otherwise completely free software operating system installed

2. Don’t make it easy to have the user get his/her hardware working, make them install a different distribution that may respect software freedom far less

Although option one would seem more logical at a glance, we have already established distribution maintainers wishing to comply with the FSF guidelines for free software distributions will need to elect to go with option two.

Now that all the discussion of firmware and microcode is out of the way, I have paved the way to explain what really makes me mad in all of this.

From the above, we can conclude that Free software distributions do not want us to run hardware that requires non-free binary blobs of any kind – no matter how small the blob or how important the hardware may be. Now have a look at, say, the download page for Trisquel. Trisquel apparently supports 32-bit or 64-bit PCs (ie. x86-architecture, ie. AMD and Intel CPUs, ie. CPUs that require priorietary microcode to function). Where are the download links for people that have that have RISC CPUs that don’t require proprietary microcode (eg. MIPS, like the Loongson processors as used in the Lemote netbook that RMS uses)? No, Trisquel doesn’t really make any effort or seem to care about you running a 100% free software computer. To do so would mean dropping support for one of their main sponsors Think Pengiun computers, which only ship Intel x86 PCs!

If the free software guidelines were serious about avoiding non-free blobs, they should be blacklisting hardware known to disrespect user freedom by mandating blobs – regardless of how the blobs get installed, and should probably be dropping x86 architecture support. Alternatively they could go the other way and allow any non-free blobs, if they are stripped to the absolute minimum required to get hardware actually working, so end users gain the maximum possible free software experience from their hardware. Of course they wont do either of these things though. Neither having a completely free software computing experience, or having things work correctly for end users is their primary goal; it’s all about marketing.

I apologise for the downtime Sunday evening. What follows is a description of the problems I ran into which caused this.

It was about 6pm. J- and I were trying to figure out some issues we had been experiencing with XMPP. I run ejabberd in a VM on my server, which I’m reasonably happy with. J- on the other hand was using a Google Talk account, but always appeared invisible on my contact list. Yet, I was clearly visible and online on her roster.

My suspicions were that it was somehow related to Google Talk – it’s been in the news that Google is breaking federation, and they have broken it (partially at least) in the past. J- sought to fix this by signing up for a dukgo.com account. Oddly, this resulted in the same strange issue.

Next, I thought I might want to investigate my own XMPP server. I was only running stock Debian Squeeze, so figured I should probably upgrade to the latest stable before spending any significant amount of time on it. After all, how long could an upgrade take? It was 6:30pm on a Sunday evening, but I also had slides to come up with for a talk at LUV Tuesday night. Surely the upgrade wouldn’t take more than about an hour?

After all the packages had been upgraded, it was time to reboot the instance into a new kernel. That’s when I ran into my first problem – the instance refused to boot. It seemed that pygrub, which is what I was using for a boot-loader, was unable to parse the newly generated grub.cfg file.

Pygrub is a part of my dom0, which also was running Squeeze. My thinking was that hopefully if I upgraded the dom0 to Wheezy too, it will support the new Grub configuration format. Worth a shot. And so I began the dom0 upgrade.

After all the packages on the dom0 were upgraded, it was now time to reboot and cross my fingers. Thankfully, the reboot was successful. I was very glad to see the processes of runlevel 2 initiate. Very glad… except one of my instances refused to boot. Not just any instance, but my firewall! No more Internets! Panic started to settle in.

The ADSL modem connected to the server via USB. The entire USB controller was using xen-pciback for device pass-through to to the guest. This functionality was no longer working – the dom0 decided that the device was no longer available and could not be passed through. If it could not be passed through, the firewall instance refused to start (and wouldn’t be very useful even if it did). This was starting to be a real annoyance. Now I had to unload the kernel modules, play with /sys entries to free up the device, and then boot the firewall again. There was some tinkering with dom0’s Grub kernel parameters along the way, but eventually I got the firewall to boot *and* see the USB device. It took hours, but I finally did it. Sorta.

There were a ton of USB driver error messages in dmesg output of the firewall. The USB stack was failing and was unusable. I tried various pass-through configurations, but ultimately I was not able to get the guest to use any kind of USB device. Seems like some kind of regression.

At this point it was getting quite late, and I wasn’t in the mood for playing around any longer. I just wanted things working again – and preferably without having to undo all my work by restoring from backups. Fine, I thought. If I can’t pass through the USB controller, I’ll just install a spare PCIe NIC and pass through that instead. After all, my modem supports connectivity from either USB or Ethernet, and it doesn’t matter to me which.

Although this seemed like a good approach, and I had the hardware to spare, things once again didn’t work out. The dom0 kernel wanted to load the device drivers of this hardware for itself, and I would have to prevent that if I were to be able to use that in the guest. The kernel driver module was r8169. I started creating entries in /etc/modprobe.d/ and rebuilding the initramfs, which is when it hit me… this is the same kernel module as used by the other integrated network port in the server – which I very much need. If I prevent this from loading, I won’t be able to remotely connect to the server any more via my LAN!

It was somewhere in the early hours of Monday morning, I had no Internet access (except through tethering with my N900), I had to go to work the same day, I had not had much sleep the night before, I had slides for a presentation that needed to be created, and I knew J- would kill me if I left the server in this broken state for too long. Further, I wasn’t sure how to proceed, and (to add insult to injury) my N900 battery just died.

I checked the server, and observed that it had two unused PCI slots. Thankfully my home server runs on an old budget motherboard that still supported them, as I figured I could scrounge up an old PCI NIC or two. After pulling some old boxes out of storage, I did indeed find spare PCI NICs. The first one I tried required yet another r8169 kernel module, but then I found an old PCI NIC that was gigabit and had heatsinks on it! I couldn’t see what it was under the heatsinks, but given that the other chipsets were bare, it seemed it would probably be something different. Turned out to be some kind of National Semiconductor NIC. No idea where I brought it from or how long I have had it for, but it proves that sometimes it really does pay to keep old crap. :)

So, after installing it, messing around a bit with /etc/modprobe.d/ rebuilding initramfs, tinkering with the dom0 kernel parameters to provide appropriate device-specific xen-pciback parameters (because I’d forget about them if they weren’t in /proc/cmdline), changing the firewall VM configuration profile, etc… my Internets were back.

Unfortunately, even as I write this I still have not had time to go back and investigate the original issue – J- is still invisible to me in my roster when she should appear as online.

System Saviour

In a world where no machine is safe…

Monthly Archives: June 2013

Why I will not back FSF’s guidelines for free software distributions

Fun times – upgrading Xen dom0 to Wheezy