This article explains the foundational concepts behind Large Language Models (LLMs): how they are trained, how they generate responses and what practical limits (such as the context window) influence output quality.
It is intended as a non-technical overview for anyone working with AI assistants or prompt-based tools.
The lifecycle of an AI model
A LLM goes through two distinct phases:
1. Training phase
- The model is trained on large datasets.
2. Usage phase (Inference)
- The model generates responses to your prompts in TextCortex.
- The model cannot learn new information during usage. It uses what it already learned during the training phase.
Understanding this distinction is essential: you can improve results by providing better instructions and context but you are not “teaching” the model in a permanent way during regular usage.
What is a token?
A token is the unit of text an AI model processes (roughly a word or word fragment). On average, 1 token consists of about 4 characters, though it varies depending on the language. For example:
- “Hello Zeno” is approximately 2 tokens.
- “Takeaway” may be split into multiple tokens (e.g., “take” + “away”).
How models learn: next-token prediction
During training, the model learns using next-token prediction: it repeatedly predicts what token should come next in a sequence.
📌 Example:
“The Capital of Germany is___” → the model learns that “Berlin” is highly likely.Across billions of such predictions, the model learns statistical relationships between words, concepts and common patterns in text.
Knowledge cutoff
After training, the model’s parameters are fixed. The knowledge cutoff date indicates when training data collection ended—meaning the model may not know about events after that date.
Using an AI Model (Inference)
What is inference?
Inference is the phase where a trained model generates text in response to your prompt. Unlike training, inference does not update the model’s knowledge in any way.
How responses are generated
When you enter a prompt, the model generates the response token by token:
- You send a prompt (e.g., “Hi”).
- The model predicts likely next tokens (e.g., “Hello”).
- It continues predicting the next token based on everything generated up to that point.
- The model stops when it determines the answer is complete.
📌 What to expect:
Because generation is probability-based, two similar prompts can produce different responses—especially when instructions are vague or the topic is open-ended.What is a context window?
The context window is the maximum amount of text (in tokens) a model can process in a single request. You can think of it as the model’s working memory. Everything the model should consider must fit inside it, including:
- Your message.
- Your conversation history.
- System instructions.
- Attached documents.
- Relevant knowledge base content.
Getting better results in TextCortex (practical tips)
Use these best practices to improve response quality without changing the model you're working with:
- Be explicit about goals and constraints.
Include your desired format, target audience, tone and any “must include / must avoid” requirements. - Provide source material in the prompt or attachments.
If the task depends on internal information, include it (you can also leverage knowledge bases). - Maintain context focus.
Keep in mind that extra, unrelated text consumes context window space and can reduce accuracy in the generated response.
💡 Pro Tip!
You can use TextCortex to optimize your prompt by clicking on the little magic wand icon to enhance your instructions.- When working with long chats, restate the essentials.
Summarize your key requirements once more before asking for the final output.
Example prompts (ready to be used!)
📌 Summarization:
“Summarize the attached business report in bullet points. Include the 5 key takeaways and a short risk/impact section.”📌 Knowledge extraction:
“Analyze the attached support tickets, list the top recurring customer issues and group them by theme. Provide examples of possible solutions I can suggest for each group.”📌 Drafting:
“Draft a professional help center article about [topic]. Provide headings, short paragraphs and include a FAQ section at the very end.”