Dr.Jiw: Token meaning and pricing in LLM

วันพุธที่ 29 เมษายน พ.ศ. 2569

Token meaning and pricing in LLM

A token is the basic unit of data that Gemini models use for input and output. For text, one token is about 4 characters or 0.75 words. For other media, it represents a fixed "slice" of information, such as a patch of pixels or a fraction of a second.

Token Meanings by Media Type

All media is converted into tokens to fit within the model's 1M to 2M token context window.

Text: Approximately 4 characters per token. Standard English text averages about 750 words per 1,000 tokens.
Images: Small images (≤384px) count as 258 tokens. Larger images are divided into 768x768 pixel blocks, each costing 258 tokens.
Video: Typically converted at 263 tokens per second.
Audio: Typically converted at 32 tokens per second.
Reasoning: Models like
Gemini 3.1 Pro
generate internal "thinking tokens" during complex tasks, which are billed as output tokens

Token Pricing (Per 1 Million Tokens)

Pricing is pay-as-you-go, with different rates for input (data read by the model) and output (data generated by the model). Rates often double if the total context exceeds 200,000 tokens.

Gemini Model Tier	Input Price (≤200k)	Output Price (≤200k)	Input Price (>200k)	Output Price (>200k)
3.1 Pro (Preview)	$2.00	$12.00	$4.00	$18.00
2.5 Pro	$1.25	$10.00	$2.50	$15.00
2.5 Flash	$0.30	$2.50	N/A*	N/A*
2.5 Flash-Lite	$0.10	$0.40	N/A*	N/A*

*Flash models typically have flat pricing regardless of context length up to their limit.

Cost Reduction Features

Context Caching: Reusing large datasets (e.g., long PDFs) can reduce input costs by 90%, with rates as low as $0.01 to $0.20 per 1M tokens.
Batch API: Submitting non-urgent tasks for asynchronous processing provides a 50% discount on standard paid rates.
Free Tier: Available through Google AI Studio for development, typically capped at 1,500 requests per day.

วันพุธที่ 29 เมษายน พ.ศ. 2569

Token meaning and pricing in LLM

ค้นหาบล็อกนี้

คลังบทความของบล็อก