NeuraLinux Bringing GenAI to the Linux desktop

System package managers abstract too much from enthusiasts

People are so detached from the software they use that it reminds me of the American meat industry. No one cares about the chicken who lived for 6 weeks when you’re eating your breaded Dino nuggets. Similarly, no one cares about the long hours of non-profit programming and culture wars when you’re telling ChatGPT to do your homework so you can continue scrolling through TikTok. But like the vegan electric-car-driving liberals will tell you, you should care.

As programmers, we can get satisfaction and experience by investigating the breakthroughs and mistakes of those before us, the same way I imagine an architect gets satisfaction by looking at a dubiously prosperous Middle Eastern skyline. You can see some funny things that programmers do when they think their only accountability is Leonard the scrum-master, who would rather grow his pickling collection (or whatever scrum-masters do in their free time) than read their mediocre code. You do have to search the canals of the internet and poke and prod, but confirming it for yourself is both hilarious and depressing that this is the software that runs the world. Take for example, the class naming choices of the Spring framework for Java. There is Controller, Service, and of courseAbstractSingletonProxyFactoryBean. The class describes itself as a “convenient superclass for FactoryBean types that produce singleton-scoped proxy objects.” I have no clue what any part of that sentence means, but I assume if you’re at the point that you need this class, absolutely nothing about your situation could possibly be considered “convenient.”

How about the code base of old GNU software? For example, When you investigate the C source files of glibc, you will quickly realize that Richard Stallman coded in a parallel universe in which C is Turing complete purely because of its processer directives. Because of a mix of historical baggage, respecting locales, and the need for portability, glibc has such hilariously messy source files that I’m convinced the only people who can still maintain them have such bad dyslexia that they think preprocessor directives look like normal C code when they squint. You think that was harsh? It would be instructive to look at an example.

My favorite one of this comes from A tale of two libcs, by Drew DeVault. Here, he investigates a segfault that was caused by the function isalpha defined in ctype.h. First, here’s the implementation in the musl libc.

int isalpha(int c)
{
	return ((unsigned)c|32)-'a' < 26;
}

Very straightforward. Now here’s how glibc does it. (if you already have a headache, feel free to skip past this because the details are not pertinent)

#define __exctype(name) extern int name (int) __THROW
__exctype (isalpha);

#ifndef _ISbit
# include <bits/endian.h>
# if __BYTE_ORDER == __BIG_ENDIAN
#  define _ISbit(bit)	(1 << (bit))
# else /* __BYTE_ORDER == __LITTLE_ENDIAN */
#  define _ISbit(bit)	((bit) < 8 ? ((1 << (bit)) << 8) : ((1 << (bit)) >> 8))
# endif
enum {
  // ...
  _ISalpha = _ISbit (2),	/* Alphabetic.  */
  // ...
};
#endif /* ! _ISbit  */

#if !defined __NO_CTYPE
# ifdef __isctype_f
__isctype_f (alnum)
// ...
# elif defined __isctype
# define isalnum(c)     __isctype((c), _ISalnum) // <- this is it

#ifndef __cplusplus
# define __isctype(c, type) \
  ((*__ctype_b_loc ())[(int) (c)] & (unsigned short int) type)
#elif defined __USE_EXTERN_INLINES
// ...
#endif

extern const unsigned short int **__ctype_b_loc (void)
     __THROW __attribute__ ((__const__));
extern const __int32_t **__ctype_tolower_loc (void)
     __THROW __attribute__ ((__const__));
extern const __int32_t **__ctype_toupper_loc (void)
     __THROW __attribute__ ((__const__));

This uses “five macros, whether or not you’re using a C++ compiler, the endianness of your machine, a look-up table, thread-local storage, and two pointer dereferences.” If you’re interested how that hodgepodge of features ends up implementing isalpha, I highly recommend checking out that article.

But this article was actually controversial because the glibc implementation does nothing wrong other than strictly comply to the POSIX specification for isalpha, which must deal with signed integers instead of unsigned chars because EOF does not fit into the latter. Redditors proclaim this bitwise lookup table is a brilliant optimization that warrants the complexity. That made me wonder what a modern language that’s horny for optimization does.

#[inline]
pub fn is_alphabetic(chr: u8) -> bool {
  (chr >= 0x41 && chr <= 0x5A) || (chr >= 0x61 && chr <= 0x7A)
}

oh… If only there was a way to change a decades old specifications without breaking their portability and making everyone mad. Reminds me of another comic… but I digress yet again.

Like I said, people don’t care about the chicken. Software deployment is starting to feel like cybersecurity with how airtight and absurdly bulletproof the system is. Every precompiled binary is indifferently packaged into an airtight titanium crate, shipped by sea for exactly 35ms, and thrust straight into the deepest parts of your filesystem where you, the supervisor, see only a streak of “[######…” as proof of the binary existing. Not having to know anything about the libraries and applications that power your setup is a blessing, because after all, that’s what the Windows and MacOS people do, and they actually get things done.

But even if the vegans got to you and you’d like to go Sherlock Holmes on some code, even on Linux machines, source code is a second class citizen to binaries. Downloading source when you have a package manager feels like asking for extra guac at Chipotle; they’ll let you do it, but give you weird looks and judge your intentions. Not to mention you are responsible for organizing or deleting the directories, because at the end of the day source code isn’t considered package. That’s why in an educational package manager, I want to be able to manage source code, especially individual source files, because we can learn a lot by reading such things. The nix package manager inadvertently made big strides toward this goal because it “treats software distribution as a caching layer for software building”. This keeps you closer to the slaughterhouse and eliminates a whole host of consistency issues. But I want to take it a step further.

Reproducible package managers solve very real and complex problems in the world, like dependency-hell for the people who actually get things done. It must be a nightmare for DevOps people to learn, but everyone else doesn’t even have to know their software wasn’t built on their machine. But no-one is using the Linux desktop because it makes life easier than the alternatives. For example, I dedicated many hours to trying to connect my laptop with Arch Linux to a public external monitor setup using USB-C, and I learned a lot about device drivers, but was still defeated by the evil forces of Wayland incompatibility. No, they’re using Linux because A, they have plenty of spare time, and B, they are intrinsically attracted to the guts of software. If you wanted to be maximally productive, you would be using Windows with WSL or an Intel CPU MacBook.

Even though everyone has joined the nix fan club by now, I believe we are still portage users at heart. Whether or not you feel comfortable enough giving Gentoo a shot, you’re still curious what it would be like and what you would learn. portage feels like a wise old man walking you to a precipice, where if you look down you can vaguely see the words, “Linux from Scratch”, but after you go home you still have nice things like Nintendo consoles and indoor plumbing. Yet we yearn to know what it would be like to stand at the base of that cliff, to pick and choose our libraries ourselves and finally extinguish libX11 from our systems forever. I know the feeling very well, but as a college student, I don’t have the time to become utterly unproductive for months. I know I could do it in a VM, but that doesn’t mean my attention would leave the VM just because my laptop is closed.

What I need is a creative, persistent person to write my package declarations for me, someone who can scour the internet for installation instructions. Someone who can thoughtfully avoid unnecessary dependencies through a USE flags mechanism. Then I need a pragmatic, no-nonsense person to actually follow said instructions, who’s flexible enough to take the best qualities of reproducible package managers, but without the added complexity. Someone who can efficiently store my files, does away with FHS, and instead keeps packages in their own rooms. Then I would be the supervisor who oversees every file in my system, who understands every process happening behind the scenes. I would know the relationship between sway and dbus, between nvim and treesitter, because in this land, I oversaw their friendships. Sadly, no people like this exists…

Until now, with Llama-2 being free. We finally have an LLM that is smart enough to handle the complexity of this system management task, while avoiding the API fees of OpenAI. Thus, I want to create this “package manager companion” that guides you through the base of the cliff, generating package declarations and waiting for you to check them off. It would automatically update these declarations when it detects an out of date package, and you could choose how much you want to know about these continuous updates. It could even generate instructions for any existing package manager, to make the lives of repository maintainers significantly easier. It doesn’t even have to be restricted to Linux, any programming language’s package repository can become completely autonomous through a system like this, fine tuned on their package format combined with automated testing. However, I will try to make sure my optimism does not manifest as feature envy, so for now, I will stick to the wise old man for Linux machines.