Free. Accurate. Every major tokenizer.

LLM Token Counter

Count tokens across OpenAI, Anthropic, Google, and open-source tokenizers. See costs, visualize token boundaries, and understand how tokenization works.

Input Text Loading tokenizer...

Token Visualization OpenAI / tiktoken (BPE)

Tokenizer Counts

OpenAI / tiktoken

BPE · GPT-4.1, GPT-4o, o3, o4-mini

GPT-4.1$0.00

GPT-4.1 mini$0.00

GPT-4o$0.00

Anthropic / Claude

BPE variant · Opus 4, Sonnet 4, Haiku 3.5

Claude Opus 4$0.00

Claude Sonnet 4$0.00

Claude Haiku 3.5$0.00

Google / Gemini

SentencePiece · Gemini 2.5 Pro, 2.5 Flash

Gemini 2.5 Pro$0.00

Gemini 2.5 Flash$0.00

Open Source

SentencePiece · Llama 4, Mistral, Grok 3

Llama 4 (via API)$0.00

Self-hostedFree

Text Statistics

Characters 0

No Spaces 0

Words 0

Sentences 0

Paragraphs 0

Reading Time 0s

Tokens/Word 0.00

Chars/Token 0.00

Related Tools

100% free. No data stored. No signup. Built by WellerDeveler.

Understanding LLM Tokens

Tokens are the fundamental units that large language models use to process text. Instead of reading individual characters, LLMs break text into chunks called tokens using a process called tokenization.

A token can be as short as a single character or as long as a full word. Common English words are usually one token, while less common words get split into smaller pieces.

Hello world! → 3 tokens
Unbelievably → 3 tokens
The quick brown fox → 4 tokens

As a rough rule of thumb, 1 token is approximately 4 characters or 0.75 words in English text.

Each LLM provider trains their own tokenizer with a unique vocabulary. OpenAI uses tiktoken with Byte Pair Encoding (BPE), while Google and Meta use SentencePiece. Anthropic uses a BPE variant with a slightly different vocabulary.

A larger vocabulary means common patterns get their own token, resulting in fewer total tokens. Newer tokenizers tend to be more efficient because they are trained on larger and more diverse datasets.

Tokenization methods rarely change between model versions from the same provider. GPT-4.1 and GPT-4o share the same tokenizer family, and Claude Opus 4 and Claude Sonnet 4 count tokens the same way. That is why this tool groups by tokenization method rather than listing every model version.

API Costs: Every API call is billed per token. Input tokens (your prompt) and output tokens (the response) are priced separately, with output typically costing 2-5x more.

Context Windows: Each model has a maximum number of tokens it can process in a single conversation. Exceeding this limit means your earliest messages get truncated or the request fails.

Rate Limits: API providers enforce tokens-per-minute limits. Knowing your token count helps you stay within rate limits and optimize throughput.

Latency: More tokens means longer processing time. Reducing token count in your prompts directly reduces response time.

Be concise: Remove filler words, redundant phrases, and unnecessary context. "Summarize this article in 3 bullet points" beats "Could you please take this article and provide me with a summary of the key points in the form of 3 bullet points?"

Use system prompts wisely: Put reusable instructions in the system prompt instead of repeating them in every user message.

Avoid unnecessary whitespace: Extra blank lines and excessive indentation add tokens. Consecutive spaces beyond one are separate tokens.

Choose models strategically: Newer tokenizers tend to produce fewer tokens for the same text, saving both cost and context space.

Truncate context: Only include the relevant portions of long documents rather than the entire thing.

Model	Context Window	Max Output	Approx. Pages
GPT-4.1	1M	32,768	~1,550
GPT-4o	128K	16,384	~200
o3 / o4-mini	200K	100,000	~310
Claude Opus 4 / Sonnet 4	200K	16,384	~310
Claude Haiku 3.5	200K	8,192	~310
Gemini 2.5 Pro	1M	65,536	~1,550
Gemini 2.5 Flash	1M	65,536	~1,550
Llama 4 (Scout)	10M	varies	~15,500
Llama 4 (Maverick)	1M	varies	~1,550
Grok 3	128K	16,384	~200

LLM Token Counter

Understanding LLM Tokens

What are tokens?

Why do tokenizers count differently?

Why do tokens matter?

Tips for reducing token count

Context window reference table