Merge branch 'typo-fixes'

2026-05-07 23:50:47 -07:00
parent 1044741636 0bba7b3181
commit 13a5508ddc
7 changed files with 47 additions and 42 deletions
@@ -1,6 +1,10 @@
 AI has been a huge word lately; let me try and figure out what it is.

 If you see anything wrong (not incomplete, but actually wrong), let me know :).
+## Large language model (LLM)
+LLM models are tensor networks that an activation matrix activates, resulting in an output matrix.
+
+The "open models" available online are still largely closed-source; the matrices are basically binary blocks that describe the weights the model assigns to each tensor.

 ## The basics of the LLM

@@ -13,24 +17,24 @@ This is the flow; understand this, and you understand 90% of the important bits:
 The model is trained to predict text. The training data might look something like:

 ```
-<|turn>system
-You are a helpful hacker named Acid Burn.<turn|>
-<|turn>user
-How do I use `ls` to sort files by size?<turn|>
-<|turn>model
-You can use `ls -lt`<turn|>
+system
+You are a helpful hacker named Acid Burn.
+```
+user
+How do I use `ls` to sort files by size?
+```
+You can use `ls -lt`
 ```

 When the model is used, it is given this part of the message:

 ```
-<|turn>system
-You are a helpful hacker named Acid Burn.<turn|>
-<|turn>user
-How do I use `ls` to sort files by size?<turn|>
-<|turn>model
+system
+You are a helpful hacker named Acid Burn.
+```
+user
+How do I use `ls` to sort files by size?
 ```
-
 It looks at all the samples it has, and autocompletes the remaining bits.

 ### Training
@@ -55,9 +59,8 @@ very difficult.
 The most practical, and common approach, is to change the text we are asking the model to
 autocomplete. This can come in many forms, but I see there as being N primary ways:

-1. Add more data to the request. This can include copy-pasting stuff, using 'retrieval augmentation'
-    to pull related information from dataset (or from a search engine)
-2. Allow loading more context as requested by the model; this can include tool-calling
+1. Add more data to the request. This can include copy-pasting stuff, using 'retrieval augmentation' (also known as Retrieval-Augmented Generation or RAG) to pull related information from a dataset (or from a search engine). Basically, before sending the prompt to the LLM, the client does a search to find additional context. There are lots of tools for doing this, but the most popular seem to be from the AI community, and work by converting the user input to a 'vector' of NLP tokens, using a specialized 'vector database' to find other 'chunks' of related inputs, then add those to the message before sending it to the LLM.
+2. Allow loading more context as requested by the model; this can include tool-calling. This is a super powerful capability; developers generally implement this by telling the LLM how to structure its output to make tool calls, then attempting to parse the LLMs output to detect tool calls, run the tools, and append the result to the message going into the LLM.

 Everything is one of these 2 techniques. The vast majority of systems that are being advertised are
 effectively just storing context (or summarized context), adding it to your prompt, and stripping it out