Instructions to use Qwen/token-classification-model-v2 with libraries, inference providers, notebooks, and local apps.
Libraries
How to use Qwen/token-classification-model-v2 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="Qwen/token-classification-model-v2")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages)
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Qwen/token-classification-model-v2")
model = AutoModelForCausalLM.from_pretrained("Qwen/token-classification-model-v2")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.
token-classification-model-v2
Updated 8 days ago
π Join our WeChat or Discord community.
π Check out the token-classification-model-v2 blog and token-classification-model-v2 Technical report.
π Use token-classification-model-v2 API services on Z.ai API Platform.
π Try token-classification-model-v2 here.
Introduction
We're introducing token-classification-model-v2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor and, for the first time, delivers that capability on a solid 1M-token context.
- Solid 1M Context: A solid 1M-token context that stably sustains long-horizon work.
- Advanced Coding with Flexible Effort: Stronger coding capabilities with multiple thinking effort levels to balance performance and latency.
- Improved Architecture: We propose IndexShare, which reuses the same indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9Γ at a 1M context length.
- Pure Open: An MIT open-source license β no regional limits, technical access without borders.
Benchmark Results
| Benchmark | token-classification-model-v2 | Qwen3.7-Max | DeepSeek-V4-Pro | Claude Opus 4.8 | Gemini 3.1 Pro |
|---|---|---|---|---|---|
| Reasoning | |||||
| HLE | 40.5 | 41.4 | 37.7 | 49.8* | 45.0 |
| GPQA-Diamond | 91.2 | 90.0 | 90.1 | 93.6 | 94.3 |
| Coding | |||||
| SWE-bench Pro | 62.1 | 60.6 | 55.4 | 69.2 | 54.2 |
| DeepSWE | 46.2 | 18.0 | 8.0 | 58.0 | 10.0 |
Model Stats
Downloads last month67,107
Model size753B params
Tensor typeBF16 / F32
Evaluation Results
SWE Bench Pro62.1
ScaleAI/SWE-bench_Pro
Diamond91.2
Idavidrein/gpqa
Deep Swe46.2
datacurve/deep-swe
Spaces using this model
41π€smolagents/ml-intern
π akhaliq/GLM-5.2
πangelorovatti/zai-org-GLM-5.2