Problem discussion

Random access everything

Feb 22, 2024

Mohammed Kheezar Hayat

Product

A lot of software is just about building new indexes

My wife and I were talking about computer memory the other day. She has been having some issues with her work laptop. The thing is slow, usually overloaded with the dozens of documents she needs to keep open: because it does not have enough memory, i.e. RAM.

She kind of knows what RAM is, but it wasn’t entirely clear to her, so I explained. However, I found myself constantly having to hedge and caveat what I was saying, because as I spoke, I was realizing that the term is a bit vestigial.

The term RAM (Random Access Memory) is from the early days of computing, and it was used to differentiate between this kind of memory from tape and disk drives. Sequential access was slow and random access was much faster. I am skipping over some history of computer design and architecture here, but in a nutshell the difference is so stark that the purposes to which these 2 kinds of memory were put were entirely different.

Sequential access simply means that to get to a certain piece of data, you need to ‘go through’ all bits that came before it first. Random access means you don’t have to.

When you read a book for the first time, you have to read it in order. But once you are familiar with it, you might treat it as a random-accessible reference and leaf through quickly to get somewhere

Some of this depends on simply the anatomy of the medium. A tape by definition has to be wound forward or backward if you want to get to a specific point, as opposed to a disc where you only need to reposition the beam or the stylus.

Sometimes it depends on other, fuzzier factors. And it’s an analogy that works for any kind of information access. When you read a book for the first time, you have to read it in order. But once you are familiar with it, you might treat it as a random-accessible reference and leaf through quickly to get somewhere; as if you had an index in your mind. Obviously many books have actual indexes. [A book’s index is a kind of alternative version of the table of contents, the TOC is the book arranged from the classical, authorial point of view: it is in fact literally the order in which the book is bound. The index presents the book arranged in an order familiar to everyone: the alphabet.]

Of course, sequential access can be very fast too, and speed is often not the only criterion for performance: few people want to watch a movie from somewhere in the middle or at 2x speed, so sequential access is fine. But back in the 70s and 80s, broadly speaking, the distinction of degree was so great that it became one of kind. Much of memory and indeed storage today is technically random access, but the imprint of this difference in kind endures on modern computer and cloud architecture.

I have lately spent some time thinking about these distinctions: especially ‘random access’; and it seems a surprising number of innovations in software; possibly even a majority of them, can be described as some form of slow/sequential access being transformed into fast/random access. Almost any software that is about access to data, is building or improving upon random access.

The returns from making access to information faster by making physical changes (akin to moving from tapes to discs) have almost certainly diminished to zero. What we can still build are better indexes, better faster, customized indexes.

We don’t need to imagine another universe to find people whose alphabets are different from ours. From an expert’s point of view, anyone outside their domain has a different alphabet.

humanoid aliens reading books in a library

Indulge me in a thought experiment here.

Imagine a parallel universe, where human society is absolutely identical to ours. They have our knowledge, our predispositions and our paranoias. Everything is identical with these ‘Xumans’, except that their English language alphabet is ordered differently. Instead of A, B, C, theirs goes X, Q, B, U, O, A, V… You can make up the remaining ‘Xenglish’ alphabet, it does not matter. What matters is that if we find a way to send English language books from our universe to theirs, they would be fine reading them, but the indexes at the end of the books would be useless to them. For you and I, ‘aardvark’ comes before ‘ability’ but for the Xumans it is the other way round! You would need to redo the indexes on these smuggled books for them to be of any use.

I don’t think you or I have to think too much before drawing parallels between this thought experiment and much of software: at its core, much of it is about building new indexes: more random access for more people.

I also don’t think that we are anywhere close to fully indexing humanity’s collective knowledge for everyone. How to find the most efficient ways to map information, knowledge and expertise to what people already know (their ‘alphabets’). Sticking with the analogy, building new ‘indexes’ is going to be key to eventually writing entirely new books.

We don’t need to imagine another universe to find people whose alphabets are different from ours. From an expert’s point of view, anyone outside their domain has a different alphabet. Other experts in other domains have a different alphabet and different indexes.

There has been a wave of ‘knowledge management’ tools lately that use AI. Summarization tends to be a common feature in these applications as a summary seems like an obvious entry point to get a kind of ‘intelligent grip’ on information that is unfamiliar; or if there simply is a lot of it.

But summarization isn’t enough. Or at least how it’s defined currently isn’t. Instead of just a compression of words, if we think of summaries as just another kind of index, then the question becomes what else can we do with and/or alongside summaries.

At Afterword, we have built a few key features that augment summarization nicely. Afterword works multi modally, you can make summaries of not just URLs but also pdfs, audio files and videos. Once the text summaries are generated, you can zoom in and out of them, traveling through levels of abstraction, like a map. And finally you can combine various summaries from any source (text, audio, video) into hyperdocuments, digital scrapbooks that are yours to arrange as you wish.

These are tiny steps that our users love, but there is much more work to be done if we are to build new interesting and exciting indexes for human knowledge. We all need more random access in our lives.