Important
🚀 GLiNER2 is Now Available from Fastino Labs! A unified multi-task model for NER, Text Classification & Structured Data Extraction. Check out fastino-ai/GLiNER2 →
Zero-shot NER | Relation Extraction | PII Detection | Information Extraction | Token Classification
GLiNER is a framework for training and deploying small Named Entity Recognition (NER) models with zero-shot capabilities. In addition to traditional NER, it also supports joint entity and relation extraction, as well as multi-task token classification. GLiNER is fine-tunable, optimized to run on CPUs and consumer hardware, and has performance competitive with LLMs several times its size, like ChatGPT and UniNER.
Other tasks such as text classification, entity linking, and schema extraction are supported through projects in the Ecosystem.
|
Zero-shot Recognition
Extract any entity type — no labeled data or task-specific training required |
Runs Anywhere
CPU, INT8 quantization, |
Millions of Labels
Bi-encoder pre-computes label embeddings, scaling to 100+ entity types without degradation |
|
NER + Relations
Build knowledge graphs in a single pass with the joint RelEx architecture |
PII Detection
State-of-the-art multilingual PII models covering major entity types across 100+ languages |
Fine-Tune in Minutes
Few-shot learning on small datasets — bring your own labels and get competitive results fast |
With pip:
pip install glinerWith uv (faster):
uv pip install glinerWith serving support (Ray Serve):
uv pip install gliner[serve] # or: pip install gliner ray[serve]from gliner import GLiNER
model = GLiNER.from_pretrained("gliner-community/gliner_small-v2.5")
text = """
Cristiano Ronaldo dos Santos Aveiro (born 5 February 1985) is a Portuguese
professional footballer who plays as a forward for and captains both Saudi Pro
League club Al Nassr and the Portugal national team.
"""
labels = ["person", "date", "organization", "location"]
entities = model.predict_entities(text, labels, threshold=0.5)
for entity in entities:
print(entity["text"], "=>", entity["label"])Output:
Cristiano Ronaldo dos Santos Aveiro => person
5 February 1985 => date
Al Nassr => organization
Portugal => location
GLiNER models are already small, but quantization and compilation can make them significantly faster and more memory-efficient, important when running on edge devices, serving at high throughput, or keeping GPU costs low.
torch.compilefuses operations and removes Python overhead, yielding up to ~1.5x speedup with no quality loss.- FP16 quantization (
quantize=True) halves model memory and speeds up matrix operations. Combined with compilation, this gives up to ~1.9x faster GPU inference with virtually no quality loss. - INT8 quantization cuts memory by another 2x on top of FP16 and is supported out of the box, however, models need to be trained with Quantization-Aware Training (QAT) to preserve accuracy at INT8 precision.
model = GLiNER.from_pretrained(
"gliner-community/gliner_small-v2.5",
map_location="cuda",
quantize=True,
compile_torch_model=True,
)Find more information on compilation and other optimizations in the documentation.
For production workloads — high-throughput pipelines, multi-user services, or anywhere you need to go beyond single-process model.inference() calls — GLiNER provides a Ray Serve-based serving layer. It adds dynamic batching that automatically groups incoming requests, memory-aware batch sizing that prevents CUDA OOM by calibrating against your GPU, precompiled kernels for common batch sizes to avoid first-call latency, horizontal scaling across multiple GPUs via Ray replicas, and an HTTP API for language-agnostic access.
python -m gliner.serve --model gliner-community/gliner_small-v2.5 --dtype fp16Then query from Python:
from gliner.serve import GLiNERClient
client = GLiNERClient() # connects to http://localhost:8000/gliner
results = client.predict(
["John works at Google", "Paris is in France"],
labels=["person", "organization", "location"],
)More information on serving options and parameters can be found in the documentation.
GLiNER models are easy to fine-tune on your own data. Prepare your dataset as a JSON file and use the training script:
python train.py --config configs/config.yamlOr train programmatically:
from gliner import GLiNER
model = GLiNER.from_pretrained("gliner-community/gliner_small-v2.5")
model.train_model(
train_dataset=train_data,
eval_dataset=eval_data,
output_dir="models",
max_steps=10000,
per_device_train_batch_size=8,
learning_rate=1e-5,
bf16=True,
)For detailed training examples, see the example notebooks:
GLiNER supports multiple architectures tailored to different use cases:
| Architecture | Description | Example Model |
|---|---|---|
| Uni-encoder | Strong zero-shot capabilities, supports up to ~50 entity types. The original GLiNER architecture. | gliner_multi_pii-v1 |
| Bi-encoder | Scalable to massive numbers of entity types via separate text and label encoding. | gliner-bi-base-v2.0 |
| RelEx | Joint NER and relation extraction in a single model. | gliner-relex-large-v1.0 |
| GLiNER Decoder | Hybrid architecture for open NER: entity types are generated with a small decoder for maximum flexibility. | gliner-decoder-large-v1.0 |
For more details, see the documentation.
- Compliance & PII Redaction — detect and mask 40+ types of personal data (SSN, credit cards, passports, emails, IBANs, etc.) across documents and data pipelines
- Knowledge Graph Construction — jointly extract entities and relations to power Graph RAG, semantic search, and analytics
- Large-Scale Entity Extraction — use the bi-encoder to tag millions of documents against hundreds or thousands of entity types in production
- Domain-Specific NER — fine-tune on biomedical, legal, financial, or any specialized corpus with minimal labeled data
- Multi-lingual Information Extraction — extract structured data from 100+ languages with a single model
- Search & Retrieval Augmentation — parse queries into structured entities to improve search relevance and RAG pipelines
GLiNER has a rich ecosystem of community projects and integrations:
| Project | Description |
|---|---|
| GLiNER2 | Unified multi-task model for NER, text classification, and structured data extraction |
| GLiClass | Zero-shot text classification using GLiNER-style architecture |
| GLinker | Entity linking with GLiNER |
| GLiNER.cpp | C++ implementation for high-performance inference |
| gline-rs | Rust implementation of GLiNER |
| vllm-factory | vLLM integration for scalable GLiNER serving |
| gliner-spacy | spaCy integration for GLiNER |
Full documentation is available at urchade.github.io/GLiNER.
GLiNER was originally developed by:
- Urchade Zaratiana
- Nadi Tomeh
- Pierre Holat
- Thierry Charnois
We gratefully acknowledge the contributions of the open-source community, whose efforts have helped shape and improve this project.
We welcome contributions from the community! Here's how to get started:
- Fork the repository and create a new branch from
main. - Install the development dependencies:
pip install -e ".[dev]". - Make your changes — bug fixes, new features, documentation improvements, and new examples are all appreciated.
- Lint and format your code with Ruff before committing:
ruff check . --fix ruff format .
- Write tests for any new functionality and make sure existing tests pass.
- Submit a pull request with a clear description of what you changed and why.
For bug reports and feature requests, please open an issue. For questions and discussions, join us on Discord.
If you find GLiNER useful in your research, please consider citing the original paper:
@inproceedings{zaratiana-etal-2024-gliner,
title = "{GL}i{NER}: Generalist Model for Named Entity Recognition using Bidirectional Transformer",
author = "Zaratiana, Urchade and
Tomeh, Nadi and
Holat, Pierre and
Charnois, Thierry",
booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
year = "2024",
url = "https://aclanthology.org/2024.naacl-long.300",
pages = "5364--5376",
}The GLiNER family has since been extended to additional information extraction and classification tasks:
@misc{zaratiana2025gliner2efficientmultitaskinformation,
title={GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface},
author={Urchade Zaratiana and Gil Pasternak and Oliver Boyd and George Hurn-Maloney and Ash Lewis},
year={2025},
eprint={2507.18546},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2507.18546},
}@misc{zaratiana2026gliguardschemaconditionedclassificationllm,
title={GLiGuard: Schema-Conditioned Classification for LLM Safeguard},
author={Urchade Zaratiana and Mary Newhauser and George Hurn-Maloney and Ash Lewis},
year={2026},
eprint={2605.07982},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2605.07982},
}@misc{zaratiana2026gliner2piimultilingualmodelpersonally,
title={GLiNER2-PII: A Multilingual Model for Personally Identifiable Information Extraction},
author={Urchade Zaratiana and Ash Lewis and George Hurn-Maloney},
year={2026},
eprint={2605.09973},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2605.09973},
}This project has been supported and funded by F.initiatives and Laboratoire Informatique de Paris Nord.
F.initiatives has been an expert in public funding strategies for R&D, Innovation, and Investments (R&D&I) for over 20 years. With a team of more than 200 qualified consultants, F.initiatives guides its clients at every stage of developing their public funding strategy: from structuring their projects to submitting their aid application, while ensuring the translation of their industrial and technological challenges to public funders. Through its continuous commitment to excellence and integrity, F.initiatives relies on the synergy between methods and tools to offer tailored, high-quality, and secure support.
We also extend our heartfelt gratitude to the open-source community for their invaluable contributions, which have been instrumental in the success of this project. ❤️

