Optimizing Token Spend (Admin Guide) – TextCortex Help Center

Premium model usage draws from each seat's token wallet. When the wallet is empty, users keep working on standard models with no lockouts. The settings and habits below help your tenant get the most value from premium usage before that point. The model you choose is the single biggest lever on your budget.

1. Use model allowlists

Restrict access to very large models such as the Claude Opus family. Configure allowlists so each team only sees the models it actually needs, which prevents the most expensive frontier models from being used by default for routine work.

2. Set a tenant default model

Choose one strong, efficient model as the default for the whole tenant. A good default guarantees high-quality responses while keeping per-query cost predictable and consistent across all users.

Kimi and GLM models deliver near-state-of-the-art performance at a much lower token burn than frontier-class models, and both support long context windows. If data residency matters, some also offer EU-hosted inference (your admin can enable this in workspace settings). Choosing efficient models helps your default wallet stretch further.

3. Keep agent background prompts lean

Trim agent background prompts down to the essentials. Remove redundant rules and instructions, and keep the prompt simple and focused. The background prompt is sent on every message, so a bloated prompt quietly raises the token cost of every interaction.

4. Use Skills for complex workflows

Move complex, multi-step workflows into Skills rather than packing everything into an Agent background. The key difference: a Skill is only read and activated when the user explicitly triggers it with a matching request, whereas Agent background is always loaded. Using Skills keeps unused context out of the prompt and is more efficient.

5. Start new chats often

Long conversations carry all previous turns into the model's context. When you move on to an unrelated task, start a new chat. This prevents unnecessary detail from earlier interactions from bloating the context window.

6. Edit your message instead of correcting it

Instead of sending follow-ups like “I did not mean this, I meant that...”, edit your original message. Editing keeps the conversation tight and avoids piling extra words and back-and-forth into the context.

7. Reserve Auto mode for complex tasks

Use Auto mode for genuinely complex work. For simple, well-defined tasks such as writing an email, translating text, or summarizing meeting notes, select a model manually rather than letting Auto mode escalate to a heavier model.

8. Check price/performance on hover

Hover over any model to view its price/performance benchmark. Use this to quickly judge cost efficiency and pick the most economical model that still meets the quality bar for the task.

9. Create and set a tenant-default agent for newcomers

Set a tenant-default agent that greets users when they log in, and give it an optimized, lightweight prompt aimed at new users. A good onboarding agent points people to the resources, agents, and skills already available in your workspace instead of letting them burn tokens building their own from scratch through trial and error. Encourage newcomers to take inspiration from existing resources first, so early usage stays efficient and consistent with your organization's standards.

10. Tune Extra Usage per user (optional)

You can grant some groups a higher individual Extra Usage allowance while reducing it for others. This is entirely optional and turned off by default.

Consult your procurement team before making any changes to Extra Usage settings.

Key takeaway

Efficient models deliver near-frontier quality at a fraction of the token burn, so smart model selection plus lean prompts is where most savings come from.