⚠️ WARNING: This project is in its early stages of development.
Do not use in production.
PaddlePaddle/PaddleX
Topdu/OpenOCR
breezedeus/Pix2Text
NormXU/nougat-latex-ocr
huggingface/transformers
bytedance/Dolphin
docling-project/docling
huggingface/optimum
OleehyO/TexTeller
Model Category
Status
Layout
✅
Text Detection
✅
Text Recognition
✅
Formula Recognition
✅
Table Recognition
✅
Doc Orientation
✅
Text Line Orientation
✅
Multimodal OCR
✅
Formula Recognition Model
Model
CPU
CUDA
Dolphin
✅
✅
Dolphin-1.5
✅
✅
Falcon-OCR
✅
✅
unirec-0.1b
✅
✅
Text Line Orientation Model
Model
CPU
CUDA
GLM-OCR
✅
✅
LightOnOCR-2-1B
✅
✅
LightOnOCR-2-1B-ONNX
✅
✅
llava-onevision-qwen2-0.5b-ov-hf
✅
✅
The OCR Pipeline combines multiple models to perform end-to-end text extraction from document images. It orchestrates the following steps:
Document Orientation Classification (optional) — Detects and corrects the overall document rotation (0°/90°/180°/270°).
Layout Analysis (optional) — Detects layout regions (text, formula, table, image) in the document.
Text Detection — Locates text regions in the image and returns bounding polygons.
Text Line Orientation Classification (optional) — Detects whether each text line is upside down (0° or 180°) and rotates it if needed.
Text Recognition — Recognizes text content from each detected text line.
Formula Recognition (optional) — Recognizes LaTeX formulas from formula regions detected by layout analysis.
Table Recognition (optional) — Recognizes HTML tables from table regions detected by layout analysis.
When layout analysis is enabled, the pipeline first detects layout regions and then routes each region to the appropriate model:
Text regions → text detection + text line orientation + text recognition
Formula regions → formula recognition model (if available, otherwise falls back to text OCR)
Table regions → table recognition model (if available, otherwise falls back to text OCR)
Image regions — no recognition, region info only
File
Pages
E2E(ms)
OCR(ms)
PDF(ms)
2606.13108.pdf
29
6630.00
6430.90
199.10
2606.13108_zh_CN.pdf
29
6954.10
6750.30
203.80
2606.13392.pdf
30
9279.00
9162.40
116.70
2606.13392_zh_CN.pdf
34
8424.00
8248.30
175.70
try (OrtEnvironment env = OrtEnvironment .getEnvironment ()) {
// Required models
TextDetectionModel detModel = new TextDetectionModel (modelDir , "PP-OCRv6_medium_det" , env , gpuIndex );
TextRecognitionModel recModel = new TextRecognitionModel (modelDir , "PP-OCRv6_medium_rec" , env , gpuIndex );
// Optional models (pass null to skip)
DocOrientationClassifyModel docOriModel = new DocOrientationClassifyModel (modelDir , "PP-LCNet_x1_0_doc_ori" , env , gpuIndex );
TextLineOrientationModel textLineOriModel = new TextLineOrientationModel (modelDir , "PP-LCNet_x1_0_textline_ori" , env , gpuIndex );
// Optional: layout, formula, and table models
LayoutModel layoutModel = new LayoutModel (modelDir , "PP-DocLayoutV3" , env , gpuIndex );
FormulaRecognitionModel formulaModel = new FormulaRecognitionModel (modelDir , "PP-FormulaNet_plus-L" , env , gpuIndex );
TableModel tableModel = new TableModel (modelDir , "unirec-0.1b" , env , gpuIndex );
OCRPipeline pipeline = new OCRPipeline (detModel , recModel , docOriModel , textLineOriModel ,
layoutModel , formulaModel , tableModel );
Map <String , Object > params = new HashMap <>();
params .put ("recognitionBatchSize" , 1 );
List <OCRPipelineResult > results = pipeline .predictFile ("image.png" , params );
for (OCRPipelineResult result : results ) {
if (result .layoutRegions () != null ) {
for (LayoutRegionResult region : result .layoutRegions ()) {
System .out .println (region .regionType () + ": " + region .getText ());
}
} else {
String text = result .recResults ().stream ()
.map (r -> r .text ())
.collect (Collectors .joining ());
System .out .println (text );
}
}
}
Parameter
Type
Default
Description
recognitionBatchSize
Integer
1
Batch size for text recognition inference
Field
Type
Description
detPolys
int[][]
Detected text region polygon coordinates
recResults
List<RecognitionResult>
Recognized text and confidence scores
docOrientationLabel
String
Document orientation label (e.g., "0", "90", "180", "270")
docOrientationScore
float
Document orientation classification confidence
textLineOrientationLabel
String
Text line orientation label (e.g., "0_degree", "180_degree")
textLineOrientationScore
float
Text line orientation classification confidence
layoutRegions
List<LayoutRegionResult>
Layout analysis results (when layout model is enabled)
LayoutRegionResult Fields
Field
Type
Description
layoutRegion
ObjectDetectionResult
Layout region bounding box, label, and score
regionType
String
Region type: "text", "formula", "table", or "image"
textResults
List<OCRPipelineResult>
OCR results for text regions
formulaResult
TextResult
LaTeX formula text for formula regions
tableResult
TableResult
HTML table text for table regions