Alfredo V. Clemente wants you to understand what's actually happening when you chat with ChatGPT or Claude. In a detailed blog post, he frames Large Language Models as 'the world's most powerful autocomplete.' These models do one thing: predict the next token in a sequence. Everything else, from writing code to answering questions, is just clever problem framing that turns tasks into text prediction. explaining how LLMs actually work. Tokens are the basic unit, not words or characters. Using Llama 3 as his example, Clemente shows how 'Prehistoric' splits into 'Pre' and 'historic.' Shakespeare's 'To be, or not to be' becomes 12 tokens, each with a numerical ID. Llama 3's tokenizer has 128,256 unique tokens total. The model can never output anything outside that vocabulary. You can experiment with this yourself at Tiktokenizer. Training starts with a model that spits out random noise. Pre-training on trillions of tokens teaches it language, code, and facts. But a pretrained model just completes text. Ask it to 'write a poem' and it might continue with 'about your first love' because that's what followed in its training data. Instruction fine-tuning changes this. You create instruction-response pairs, wrap them in special tokens, and train the model to complete this new format. OpenAI's InstructGPT showed this approach dramatically improves helpfulness while cutting toxicity. Google's FLAN models proved instruction tuning also boosts zero-shot performance on unfamiliar tasks. For agent builders, the implication is clear. Debug accordingly.
Your LLM Is Just a Fancy Probability Engine
Alfredo V. Clemente breaks down how Large Language Models actually work: tokenization, pre-training on massive datasets, and instruction fine-tuning. His framing is simple. LLMs do one thing, predict the next token. Everything else is clever problem reframing.