Digit OCR (0–9) using OpenCV and K-Nearest Neighbors. Label your own images, train a classifier, and recognize digits in new images.
| Collect (labeling) | Predict (results) |
|---|---|
![]() |
![]() |
- Python 3.10 or newer
- A graphical display —
collectandpredictopen OpenCV windows for interactive labeling and visualization - macOS, Linux, or Windows with GUI support (headless/SSH-only environments will not work for interactive commands)
pip install -r requirements.txtUse an image containing clear digit characters (printed text, screenshots, etc.).
Label digits interactively:
python main.py collect my-image.png
# or label all images in a folder:
python main.py collect ./images/Controls:
| Key | Action |
|---|---|
0–9 |
Label current contour as that digit |
SPACE |
Skip contour (not a digit) |
BACKSPACE |
Undo last label |
ESC |
Quit and save progress |
Output files (default):
models/generalsamples.datamodels/generalresponses.data
python main.py trainOutput file (default): models/model.yml
Options:
--samples PATH— input samples file (default:models/generalsamples.data)--responses PATH— input labels file (default:models/generalresponses.data)--model PATH— output model file (default:models/model.yml)
python main.py predict my-image.pngOptions:
--threshold FLOAT— confidence threshold 0.0–1.0 (default:0.0)--no-display— skip OpenCV visualization windows--model PATH— model file (default:models/model.yml)
collect → train → predict
All generated files are saved in models/ by default (gitignored except .gitkeep).
kocr/
├── main.py # Entry point
├── src/
│ ├── cli.py # Command-line interface
│ ├── collect.py # Interactive data labeling
│ ├── train.py # KNN model training
│ ├── predict.py # Digit recognition
│ └── constants.py # Shared configuration
├── models/ # Generated data and models (gitignored)
├── docs/assets/ # README screenshots
├── tests/ # Automated tests
├── pytest.ini
└── requirements.txt
- GUI required —
collectalways needs a display;predictneeds one unless--no-displayis set - No pre-trained model — you must label and train on your own images
- Image-dependent accuracy — low contrast, noise, or unusual fonts may require tuning parameters in
src/constants.py
pip install -r requirements.txt
python -m ruff check src/ main.py tests/
pytestMIT — see LICENSE.

