วันอังคารที่ 24 กุมภาพันธ์ พ.ศ. 2569

LLM temperature

LLM temperature is a hyperparameter (typically 0 to 2) that controls the randomness and creativity of an AI's output by adjusting the probability distribution of predicted tokens. Lower temperatures (0-0.3) produce deterministic, focused, and factual results, while higher temperatures (>0.8) create more diverse, random, or "creative" text.


https://dagshub.com/glossary/llm-temperature/


The following article shows how to use Bash script to interact with Ollama

https://www.inferable.ai/blog/posts/model-temperature-first-principles


Zero Temperature (0.0): Makes the model completely deterministic. It will always choose the token (word or syllable) with the absolute highest probability at that exact step.

While setting the temperature to $0$ forces the model to choose its "best guess" every single time, the model's best guess can still be completely wrong.

How Hallucinations Are Actually Mitigated

Because temperature is just a decoding hyperparameter, engineers and researchers use structural architectural patterns to combat hallucinations:

  • Retrieval-Augmented Generation (RAG): Grounding the model by supplying it with verified, external documents to reference before it generates an answer.

  • Fine-Tuning & Reinforcement Learning from Human Feedback (RLHF): Training the model specifically to say "I don't know" when it lacks data, rather than guessing.

    RLHF: Human evaluators rank llm answers from best to worst based on quality, accuracy, and safety.

  • System Prompt Constraints: Explicitly instructing the model (e.g., "If you do not find the answer in the provided context, state that you do not know.").

If you are working on a system where factual accuracy is paramount, keeping the temperature low (around 0.0 to 0.2) is a great baseline practice—just don't mistake determinism for truth.