The aim of this series of articles is to understand how large language models (LLMs) work by building a very simple version. I've found many articles and videos explaining now neural nets work, or how LLMs work in theory, but I haven't found any that show exactly how an LLM works.
This is probably because real LLMs are so complex that no one really understands exactly what they're doing. I'm hoping that if I keep things really simple, I can build a tiny language model that is fully understandable. In particular, I want to understand how attention mechanisms and transformers work, since these are the key innovations that make LLMs so powerful.