LLM temperature is a hyperparameter (typically 0 to 2) that controls the randomness and creativity of an AI's output by adjusting the probability distribution of predicted tokens. Lower temperatures (0-0.3) produce deterministic, focused, and factual results, while higher temperatures (>0.8) create more diverse, random, or "creative" text.
https://dagshub.com/glossary/llm-temperature/
The following article shows how to use Bash script to interact with Ollama
https://www.inferable.ai/blog/posts/model-temperature-first-principles
While setting the temperature to $0$ forces the model to choose its "best guess" every single time, the model's best guess can still be completely wrong.
How Hallucinations Are Actually Mitigated
Because temperature is just a decoding hyperparameter, engineers and researchers use structural architectural patterns to combat hallucinations:
Retrieval-Augmented Generation (RAG): Grounding the model by supplying it with verified, external documents to reference before it generates an answer.
Fine-Tuning & Reinforcement Learning from Human Feedback (RLHF): Training the model specifically to say "I don't know" when it lacks data, rather than guessing.
RLHF: Human evaluators rank llm answers from best to worst based on quality, accuracy, and safety.
System Prompt Constraints: Explicitly instructing the model (e.g., "If you do not find the answer in the provided context, state that you do not know.").
If you are working on a system where factual accuracy is paramount, keeping the temperature low (around 0.0 to 0.2) is a great baseline practice—just don't mistake determinism for truth.