9 Comments
Jan 20Liked by Dan Elton

Hey Dan, thanks for the post! You mentioned that byte pair encoding is used for efficiency - but this is something that confuses me a bit. If it were possible to use character level encoding for an LLM, wouldn't someone have done it by now? Or maybe they have and I just haven't come across it. But when I was learning how transformers work, it seemed like the dimension of the embedding space would effectively be limited by the size of the token vocabulary. Using an embedding space with more dimensions than the number of unique tokens would just end up making all the token representations orthogonal to each other (I think). But an embedding space with 26 dimensions doesn't seem likely to capture much useful world knowledge. Not to mention it would limit your context length because you could only have 26 sinusoids in your position encoding...

Expand full comment
Jan 14Liked by Dan Elton

The date implies it is current (or this is a repost of an old article). I tested a few examples like: what dose the string “SolidGoldMagikarp” refer to? I got the correct answer.

or "can coughing effectiveness can stop a heart attack"

again I got the correct answer.

Expand full comment

Nice discussion. Maybe a new job categories will emerge 'LLM docs" and "LLM Shrinks" :-) Current LLMs are a step, obviously not the end stage of AI. They're fascinating despite all their pathologies. The real questions revolve around 'our' expectations, and the reasonability of our expectations. Should we expect them to be perfect? No! Can they help in some tasks? It depends. I've been playing with several LLMs, and my conclusion is that they're helpful, but not necessarily correct or authoritative. It seems like they've been diluted by the "Alignment Enforcers". Do I expect complete or perfect answers? Absolutely not! Their training is somewhat if not seriously pathetic, in the sense that they basically use what was available in various massive crawls. Many of these crawls include seriously pathological text. Some of the texts are actually deliberately misleading, while a good portion is plainly ignorant /wrong. The best 'knowledge' usually doesn't appear in crawlable text, and often doesn't appear on any electronic medium, and rarely even in writing. There's what's called tacit knowledge that people use that never gets written down.

As a simple explicit example of that - In one of the places I worked, there was a person who was likely the best 'maker' of high frequency ultrasound transducers in the US. Making transducers is a rare art, it involves mixing materials and basically 'cooking'/'baking' them under special conditions in a unique 'oven'. As he was aging and getting ready to retire, our lab tried to induce him to write his procedures / recipe down, we provided him with personnel to train. Nope!, He wouldn't allow anyone to come into his building (yes, his own building, created to fabricate the transducers) and even watch what he was doing. Uncle Sam's Canoe Club had him on its payroll for about 4 decades, and when he retired, that knowledge was gone, the building demolished, and the capability to recreate what he did vanished. This is actually a common occurrence. A more famous version of it is Joseph von Fraunhofer closely held procedure for making Optical Glass. F made the best optical glass in the world. Germany gained a huge lead over other countries in making optical equipment, English high precision optical industry suffered as a result, Michael Faraday, one of England's best experimentalists, was tasked to reverse engineer Fraunhofer's process, and he gave up after 3 years.

What's the lesson? Really outstanding knowledge (aka 'the secret sauce' for amazing products) will likely never appear in LLMs ...

The good parts? Are there any good parts? Yes! I asked one of the LLMs to write me a python code that would print out integers that are both prime and Fibonacci. I never experiment with that before, The question just popped into my mind and within seconds of typing the prompt into one of the GPTs I had code that ran the first time. I would consider that pretty good. Was it complete? did it miss anything? I am now into checking that with mples of famous sequences in the On-Line Encyclopedia of Integer Sequences (the OEIS) [ https://oeis.org], (and am finding fascinating things). Did the LLM do its job, I would say pretty decently for a few second effort. I think one must develop certain workflow habits of knowing what to ask and how to check. Don't assume answers are complete or unique; assume there will be errors and hallucinations (performance there is getting better, in my opinion). LLMs are a tool, when you look at the root of the meaning of 'Artificial', you might encounter Francis Bacon's Artificial means that which is produced by art or human artifice. - and as we know ... humans aren't totally perfect in making machines that are perfect. :-)

Expand full comment

You did a great job on this but didn't go into problems with image generation. People with three legs or one, and non-photographic images with weird lettering and numbering. This makes Dall-E unusable for me.

Expand full comment