Free. Accurate. Every major tokenizer.

LLM Token Counter

Count tokens across OpenAI, Anthropic, Google, and open-source tokenizers. See costs, visualize token boundaries, and understand how tokenization works.

Input Text Loading tokenizer...
Token Visualization OpenAI / tiktoken (BPE)
Tokenizer Counts
OpenAI / tiktoken
BPE · GPT-4.1, GPT-4o, o3, o4-mini
0
GPT-4.1$0.00
GPT-4.1 mini$0.00
GPT-4o$0.00
Anthropic / Claude
BPE variant · Opus 4, Sonnet 4, Haiku 3.5
0
Claude Opus 4$0.00
Claude Sonnet 4$0.00
Claude Haiku 3.5$0.00
Google / Gemini
SentencePiece · Gemini 2.5 Pro, 2.5 Flash
0
Gemini 2.5 Pro$0.00
Gemini 2.5 Flash$0.00
Open Source
SentencePiece · Llama 4, Mistral, Grok 3
0
Llama 4 (via API)$0.00
Self-hostedFree
Text Statistics
Characters 0
No Spaces 0
Words 0
Sentences 0
Paragraphs 0
Reading Time 0s
Tokens/Word 0.00
Chars/Token 0.00

100% free. No data stored. No signup. Built by WellerDeveler.

Understanding LLM Tokens

Tokens are the fundamental units that large language models use to process text. Instead of reading individual characters, LLMs break text into chunks called tokens using a process called tokenization.

A token can be as short as a single character or as long as a full word. Common English words are usually one token, while less common words get split into smaller pieces.

Hello world! → 3 tokens
Unbelievably → 3 tokens
The quick brown fox → 4 tokens

As a rough rule of thumb, 1 token is approximately 4 characters or 0.75 words in English text.

Each LLM provider trains their own tokenizer with a unique vocabulary. OpenAI uses tiktoken with Byte Pair Encoding (BPE), while Google and Meta use SentencePiece. Anthropic uses a BPE variant with a slightly different vocabulary.

A larger vocabulary means common patterns get their own token, resulting in fewer total tokens. Newer tokenizers tend to be more efficient because they are trained on larger and more diverse datasets.

Tokenization methods rarely change between model versions from the same provider. GPT-4.1 and GPT-4o share the same tokenizer family, and Claude Opus 4 and Claude Sonnet 4 count tokens the same way. That is why this tool groups by tokenization method rather than listing every model version.

API Costs: Every API call is billed per token. Input tokens (your prompt) and output tokens (the response) are priced separately, with output typically costing 2-5x more.

Context Windows: Each model has a maximum number of tokens it can process in a single conversation. Exceeding this limit means your earliest messages get truncated or the request fails.

Rate Limits: API providers enforce tokens-per-minute limits. Knowing your token count helps you stay within rate limits and optimize throughput.

Latency: More tokens means longer processing time. Reducing token count in your prompts directly reduces response time.

Be concise: Remove filler words, redundant phrases, and unnecessary context. "Summarize this article in 3 bullet points" beats "Could you please take this article and provide me with a summary of the key points in the form of 3 bullet points?"

Use system prompts wisely: Put reusable instructions in the system prompt instead of repeating them in every user message.

Avoid unnecessary whitespace: Extra blank lines and excessive indentation add tokens. Consecutive spaces beyond one are separate tokens.

Choose models strategically: Newer tokenizers tend to produce fewer tokens for the same text, saving both cost and context space.

Truncate context: Only include the relevant portions of long documents rather than the entire thing.

Model Context Window Max Output Approx. Pages
GPT-4.11M32,768~1,550
GPT-4o128K16,384~200
o3 / o4-mini200K100,000~310
Claude Opus 4 / Sonnet 4200K16,384~310
Claude Haiku 3.5200K8,192~310
Gemini 2.5 Pro1M65,536~1,550
Gemini 2.5 Flash1M65,536~1,550
Llama 4 (Scout)10Mvaries~15,500
Llama 4 (Maverick)1Mvaries~1,550
Grok 3128K16,384~200