Skip to content

ningpp/flux

Repository files navigation

Flux

⚠️ WARNING: This project is in its early stages of development.

Do not use in production.

License

Special Thanks

  1. PaddlePaddle/PaddleX
  2. Topdu/OpenOCR
  3. breezedeus/Pix2Text
  4. NormXU/nougat-latex-ocr
  5. huggingface/transformers
  6. bytedance/Dolphin
  7. docling-project/docling
  8. huggingface/optimum
  9. OleehyO/TexTeller

Support Model Category

Model Category Status
Layout
Text Detection
Text Recognition
Formula Recognition
Table Recognition
Doc Orientation
Text Line Orientation
Multimodal OCR

Support Model List

Layout Model

Model CPU CUDA
docling-layout-egret-large
docling-layout-egret-medium
docling-layout-egret-xlarge
docling-layout-heron
docling-layout-heron-101
PP-DocLayoutV2
PP-DocLayoutV3
PP-DocLayout_plus-L
PP-DocLayout-L
PP-DocLayout-M
PP-DocLayout-S
PicoDet-S_layout_17cls
PicoDet-L_layout_17cls
RT-DETR-H_layout_17cls

Text Detection Model

Model CPU CUDA
PP-OCRv6_medium_det
PP-OCRv6_small_det
PP-OCRv6_tiny_det
PP-OCRv5_server_det
PP-OCRv5_mobile_det
PP-OCRv4_server_det
PP-OCRv4_mobile_det

Text Recognition Model

Model CPU CUDA
PP-OCRv6_medium_rec
PP-OCRv6_small_rec
PP-OCRv6_tiny_rec
PP-OCRv5_server_rec
PP-OCRv5_mobile_rec
PP-OCRv4_server_rec
PP-OCRv4_server_rec_doc
PP-OCRv4_mobile_rec

Formula Recognition Model

Model CPU CUDA
CodeFormulaV2
Dolphin
Dolphin-1.5
Falcon-OCR
GOT-OCR-2.0
granite-docling-258M
nougat-latex-base
pix2text-mfr
pix2text-mfr-1.5
PP-FormulaNet-S
PP-FormulaNet-L
PP-FormulaNet_plus-S
PP-FormulaNet_plus-M
PP-FormulaNet_plus-L
TexTeller
unirec-0.1b

Table Recognition Model

Model CPU CUDA
Dolphin
Dolphin-1.5
Falcon-OCR
unirec-0.1b

Doc Orientation Model

Model CPU CUDA
PP-LCNet_x1_0_doc_ori

Text Line Orientation Model

Model CPU CUDA
PP-LCNet_x1_0_textline_ori
PP-LCNet_x0_25_textline_ori

Multimodal OCR Model

Model CPU CUDA
GLM-OCR
LightOnOCR-2-1B
LightOnOCR-2-1B-ONNX
llava-onevision-qwen2-0.5b-ov-hf

OCR Pipeline

The OCR Pipeline combines multiple models to perform end-to-end text extraction from document images. It orchestrates the following steps:

  1. Document Orientation Classification (optional) — Detects and corrects the overall document rotation (0°/90°/180°/270°).
  2. Layout Analysis (optional) — Detects layout regions (text, formula, table, image) in the document.
  3. Text Detection — Locates text regions in the image and returns bounding polygons.
  4. Text Line Orientation Classification (optional) — Detects whether each text line is upside down (0° or 180°) and rotates it if needed.
  5. Text Recognition — Recognizes text content from each detected text line.
  6. Formula Recognition (optional) — Recognizes LaTeX formulas from formula regions detected by layout analysis.
  7. Table Recognition (optional) — Recognizes HTML tables from table regions detected by layout analysis.

When layout analysis is enabled, the pipeline first detects layout regions and then routes each region to the appropriate model:

  • Text regions → text detection + text line orientation + text recognition
  • Formula regions → formula recognition model (if available, otherwise falls back to text OCR)
  • Table regions → table recognition model (if available, otherwise falls back to text OCR)
  • Image regions — no recognition, region info only

Performance Demo

File Pages E2E(ms) OCR(ms) PDF(ms)
2606.13108.pdf 29 6630.00 6430.90 199.10
2606.13108_zh_CN.pdf 29 6954.10 6750.30 203.80
2606.13392.pdf 30 9279.00 9162.40 116.70
2606.13392_zh_CN.pdf 34 8424.00 8248.30 175.70

Usage

try (OrtEnvironment env = OrtEnvironment.getEnvironment()) {
    // Required models
    TextDetectionModel detModel = new TextDetectionModel(modelDir, "PP-OCRv6_medium_det", env, gpuIndex);
    TextRecognitionModel recModel = new TextRecognitionModel(modelDir, "PP-OCRv6_medium_rec", env, gpuIndex);

    // Optional models (pass null to skip)
    DocOrientationClassifyModel docOriModel = new DocOrientationClassifyModel(modelDir, "PP-LCNet_x1_0_doc_ori", env, gpuIndex);
    TextLineOrientationModel textLineOriModel = new TextLineOrientationModel(modelDir, "PP-LCNet_x1_0_textline_ori", env, gpuIndex);

    // Optional: layout, formula, and table models
    LayoutModel layoutModel = new LayoutModel(modelDir, "PP-DocLayoutV3", env, gpuIndex);
    FormulaRecognitionModel formulaModel = new FormulaRecognitionModel(modelDir, "PP-FormulaNet_plus-L", env, gpuIndex);
    TableModel tableModel = new TableModel(modelDir, "unirec-0.1b", env, gpuIndex);

    OCRPipeline pipeline = new OCRPipeline(detModel, recModel, docOriModel, textLineOriModel,
            layoutModel, formulaModel, tableModel);

    Map<String, Object> params = new HashMap<>();
    params.put("recognitionBatchSize", 1);

    List<OCRPipelineResult> results = pipeline.predictFile("image.png", params);

    for (OCRPipelineResult result : results) {
        if (result.layoutRegions() != null) {
            for (LayoutRegionResult region : result.layoutRegions()) {
                System.out.println(region.regionType() + ": " + region.getText());
            }
        } else {
            String text = result.recResults().stream()
                .map(r -> r.text())
                .collect(Collectors.joining());
            System.out.println(text);
        }
    }
}

Parameters

Parameter Type Default Description
recognitionBatchSize Integer 1 Batch size for text recognition inference

OCRPipelineResult Fields

Field Type Description
detPolys int[][] Detected text region polygon coordinates
recResults List<RecognitionResult> Recognized text and confidence scores
docOrientationLabel String Document orientation label (e.g., "0", "90", "180", "270")
docOrientationScore float Document orientation classification confidence
textLineOrientationLabel String Text line orientation label (e.g., "0_degree", "180_degree")
textLineOrientationScore float Text line orientation classification confidence
layoutRegions List<LayoutRegionResult> Layout analysis results (when layout model is enabled)

LayoutRegionResult Fields

Field Type Description
layoutRegion ObjectDetectionResult Layout region bounding box, label, and score
regionType String Region type: "text", "formula", "table", or "image"
textResults List<OCRPipelineResult> OCR results for text regions
formulaResult TextResult LaTeX formula text for formula regions
tableResult TableResult HTML table text for table regions

Releases

No releases published

Packages

 
 
 

Contributors