b'Training is the method by which a model provider sets the "weights" or "parameters" of the transformer models they produce. Think of a model weight like a pre-set multiplier that is used to process any kind of input that a human user might provide. Figure 3 provides a visual representation to help conceptualize this idea. Figure 3: Conceptualization of transformer model weights being set during training. I by multiplying th lue1Brown.com.LLMs workmage courtesy of 3B e numeric values of input text against a set of "weights" or "parameters" that are set while a model is being built. 44 This weighted multiplication operation happens each and every time a new word 45is processed by the model, resulting in the entire vocabulary of the model being ranked from most to least likely as the next appropriate word. 46In the specific example depicted in Figure 3, the model has ranked the word " worst " as the most likely next word, because its numeric weights produced that outcome when multiplied through. Before a model is trained, its internal numeric weights are initialized with no patterns at all, effectively just random numbers. 47Thus, if you ask an untrained model to finish the sentence "It was the best of times it was the________",theuntrainedmodelislikelytorespondwithrandomlyselectedwordsfromitsdefault vocabulary such as "It was the best of times it was the lorem ipsum." In simple terms, a large language model (LLM) is trained by providing it with a large body of pre-existing text, one word at a time, and directing it to guess each successive word before it is revealed.If the model guesses correctly, that outcome is reinforced. If it guesses incorrectly, its internal weights are adjusted to 44 Jay Alammar & Maarten Grootendorst, (OReilly Media 2024). 45Technically, the term istokens,but weH haanvdes -oOpnt eLda rtgoe r Leadnugceu athgeeMusoed oefls technical terms in this Guide to promote readability.supra46Alammar & Grootendorst, note 44.47Andrew Glassner,Deep Learning: A Visual Approach(No Starch Press, June 2021) (explaining that neural network training begins with random weights, producing meaningless output until adjusted through learning).Page | 26'