Apr 8

An interview with Andriy Burkov, author "The Hundred-Page Language Models Book"

5 Comments

Although I'm not one of those people who make exaggerated claims about LLMs, I think he made a bit of oversimplification.

Expand full comment

Reply (1)

Jason Knight

Apr 8

I would recommend listening to the whole episode! The guy has literally written a book on how LLMs work from first principles (which I would also recommend)

Expand full comment

Reply (1)

Eren Irmak

Apr 8

Thank you for the reply, Jason. I’ve been following him for a while, and I agree that he is one of the most insightful voices in the field. I’ll definitely listen to the full episode for the full context.

My comment wasn’t meant to critique the entire conversation, but rather a specific aspect of the summary. My interest lies especially in the emergent capabilities of LLMs. For instance, things like in-context learning, generalization, zero-shot reasoning, etc. To my knowledge, we still lack comprehensive mathematical explanations. Projecting words into high-dimensional vector spaces is indeed fascinating in solving the language mathematically, but it doesn’t yet fully explain how these cognitive-like abilities arise during training or inference. If there are rigorous mathematical accounts or recent publications on this front, I’d be grateful if you could point me toward them (I’m currently a cognitive science master's student and exploring this intersection).

Of course, I recognize that summaries can’t capture the full nuance of the discussion, and after listening to the episode, I may revise my impression. I certainly didn’t mean to downplay the work you both do. On the contrary, it’s exactly because of your reach and influence that these subtle distinctions matter.

And honestly, the recent tool released by Anthropic to probe LLM internals is very exciting because of this issue. Still so much to unpack in this 'black box' models.

Expand full comment

Reply (1)

Jason Knight

Apr 8

I'm not going to claim to be an expert in the inner workings, just a former developer who has built his fair share of (pre-LLM) models and doesn't really understand the depths of the maths (I studied it at uni, got 28% in statistics! then dropped out).

Based on the fundamentals of how LLMs work, as far as my understanding goes (and very much using things like word2vec as my analogy), it does seem that there's a clear limitation to what they do, and that scale makes everything look great. I don't believe these models are in any way conscious... but, you raise an interesting point about generalisation and also being able to derive new knowledge. Andriy covers this last part in the interview, about how they can't create stuff that doesn't exist, although I'm super-curious as to whether they can find the gaps and intersections between things that *do* exist... I still don't call it consciousness, but it would certainly be interesting.

Expand full comment

Reply (1)

Eren Irmak

Apr 8

I totally agree with you..Consciousness is a much deeper thing than just language. There’s also the physical side of it: how we interact with the world through our senses and bodies. For instance mini brain organoids..I remember reading about those "mini-brains" or brain-in-a-dish experiments a few years ago, caused a huge stir..there were a lot of questions flying around: could these tiny brain structures be conscious in some way? If they show activity, is that enough? how ethical is it to work on brain organoids, etc..One neuroscientist said something that stuck with me, that is, self-awareness might come, or arise, from sensory inputs that the brain receives and builds a kind of feedback loop based on this sensory information. That’s where consciousness might start, not from language, but from sensing and reacting.

That’s also why I think reasoning isn’t just a language thing, not an extension of words, sentences, human created symbols..Even animals can assess threats, plan, or solve problems without any words. They rely on raw sensory data and instincts. So when we see LLMs doing impressive things with language, it doesn’t necessarily mean they’re reasoning in the way a living being does, because reasoning is something more fundamental than being an extension of language.

Same with understanding language and semantic meaning..using language well doesn’t mean something is conscious or even truly understands what it’s saying. Language is one tool the brain uses, not the thing that creates consciousness in the first place.

LLMs are not consciousness but, for instance, prompting an LLM to talk like a pirate changes its overall tone, syntax, behavior, that is beyond next word prediction..Even though how can a pirate talk might be within its training data, changing its writing style and complying the piratey rules during the inference is intriguing. Parameters are the same, weights are not updated, why and how this style change occurred ? This kind of emergent meta-linguistic abilities are beyond pure maths.

Just wanted to share that angle..this stuff really fascinates me, especially when tech crosses paths with what it means to be aware or intelligent. OpenAI's AGI hype, I think, is far from being the reality. Though, they are doing great job in training performant models.

Expand full comment

One Knight in Product newsletter

E242: The TRUTH About Large Language Models…