高性能图像模板匹配库的Python绑定,使用SIMD优化实现快速匹配。
- 高性能: 使用SIMD指令集(SSE2, AVX2, NEON)加速卷积运算
- 多尺度匹配: 支持图像金字塔匹配
- 旋转不变: 支持旋转模板匹配
- 亚像素精度: 支持亚像素级定位精度
- 易于使用: 简洁的Python API
- OpenCV兼容: 与OpenCV无缝集成
-
安装系统依赖:
# Ubuntu/Debian sudo apt-get install build-essential cmake pkg-config libopencv-dev python3-dev # CentOS/RHEL/Fedora sudo yum install gcc-c++ cmake pkgconfig opencv-devel python3-dev # macOS brew install cmake pkg-config opencv python3
-
创建虚拟环境 (推荐):
python3 -m venv MatchTool_venv source MatchTool_venv/bin/activate -
安装Python依赖:
pip install -r requirements.txt
-
编译并安装:
pip install -e .
import numpy as np
import cv2
from MatchTool_python.wrapper import TemplateMatcher, match_template
# 加载图像
image = cv2.imread('source.jpg', cv2.IMREAD_GRAYSCALE)
template = cv2.imread('template.jpg', cv2.IMREAD_GRAYSCALE)
# 方法1: 使用便捷函数
results = match_template(image, template, threshold=0.8, max_matches=10)
for result in results:
print(f"位置: {result.position}, 相似度: {result.score:.3f}")
# 方法2: 使用类接口
matcher = TemplateMatcher()
matcher.set_template(template)
results = matcher.match(image)
# 可视化结果
vis_image = matcher.visualize_matches(cv2.cvtColor(image, cv2.COLOR_GRAY2RGB))
cv2.imwrite('result.jpg', vis_image)from MatchTool_python.wrapper import TemplateMatcher, MatchParameters
# 创建自定义参数
params = MatchParameters(
score_threshold=0.7, # 相似度阈值
max_positions=20, # 最大匹配数量
use_simd=True, # 使用SIMD优化
enable_subpixel=True, # 启用亚像素精度
tolerance_angle=10.0 # 角度容忍度
)
# 使用自定义参数
matcher = TemplateMatcher(params)
matcher.set_template(template)
results = matcher.match(image)匹配参数配置类:
class MatchParameters:
max_positions: int = 10 # 最大匹配数量
max_overlap: float = 0.1 # 最大重叠率
score_threshold: float = 0.8 # 相似度阈值
tolerance_angle: float = 5.0 # 角度容忍度(度)
min_reduce_area: int = 1000 # 最小缩减区域
use_simd: bool = True # 使用SIMD优化
enable_subpixel: bool = False # 亚像素精度
fast_mode: bool = False # 快速模式主要模板匹配类:
class TemplateMatcher:
def __init__(self, parameters: Optional[MatchParameters] = None):
"""初始化匹配器"""
def set_template(self, template: np.ndarray, min_reduce_area: int = 1000) -> None:
"""设置模板图像"""
def match(self, image: np.ndarray) -> List[MatchResult]:
"""执行模板匹配"""
def visualize_matches(self, image: np.ndarray) -> np.ndarray:
"""可视化匹配结果"""匹配结果类:
class MatchResult:
position: Tuple[float, float] # 匹配位置 (x, y)
score: float # 相似度分数 (0.0 - 1.0)
angle: float # 旋转角度 (度)def match_template(
image: np.ndarray,
template: np.ndarray,
threshold: float = 0.8,
max_matches: int = 10,
use_simd: bool = True
) -> List[MatchResult]:
"""快速模板匹配便捷函数"""
def check_simd_support() -> dict:
"""检查SIMD支持"""import numpy as np
import cv2
from MatchTool_python.wrapper import TemplateMatcher
# 创建测试图像
template = np.zeros((50, 50), dtype=np.uint8)
template[10:40, 10:40] = 255 # 白色方形模板
image = np.zeros((200, 200), dtype=np.uint8)
image[30:80, 50:100] = template # 在图像中放置模板
image[100:150, 120:170] = template # 放置另一个实例
# 执行匹配
matcher = TemplateMatcher()
matcher.set_template(template)
results = matcher.match(image)
print(f"找到 {len(results)} 个匹配:")
for i, result in enumerate(results):
print(f"匹配 {i+1}: 位置={result.position}, 相似度={result.score:.3f}")
# 可视化
vis = matcher.visualize_matches(cv2.cvtColor(image, cv2.COLOR_GRAY2RGB))
cv2.imwrite('matches.jpg', vis)import time
from MatchTool_python.wrapper import TemplateMatcher, MatchParameters
# 创建测试数据
template = np.random.randint(0, 256, (30, 30), dtype=np.uint8)
image = np.random.randint(0, 256, (500, 500), dtype=np.uint8)
# SIMD优化测试
matcher_simd = TemplateMatcher(MatchParameters(use_simd=True))
matcher_simd.set_template(template)
start = time.time()
results_simd = matcher_simd.match(image)
simd_time = time.time() - start
# 无SIMD测试
matcher_no_simd = TemplateMatcher(MatchParameters(use_simd=False))
matcher_no_simd.set_template(template)
start = time.time()
results_no_simd = matcher_no_simd.match(image)
no_simd_time = time.time() - start
print(f"SIMD: {simd_time:.3f}s, 找到 {len(results_simd)} 个匹配")
print(f"无SIMD: {no_simd_time:.3f}s, 找到 {len(results_no_simd)} 个匹配")
print(f"加速比: {no_simd_time/simd_time:.2f}x")运行示例:
python3 MatchTool_python/example.py本项目采用 MIT 许可证 - 查看 LICENSE 文件了解详情。
Using C++/MFC/OpenCV to build a Normalized Cross Corelation-based image alignment algorithm
The result means the similarity of two images, and the formular is as followed:

- C++ shared object (.so) with Neon SIMD for Python is runnable on Unix (Ventura 13.3) and Linux (Ubuntu Linux 22.04.02) System. Super fast using -O3
- C++ .so with Pybind11 for Python

-
rotation invariant, and rotation precision is as high as possible
-
using image pyrimid as a searching strategy to speed up 4~128 times the original NCC method (depending on template size), minimizing the inspection area on the top level of image pyrimid
-
optimizing rotation time comsuming from OpenCV by setting needed "size" and modifying rotation matrix
-
SIMD version of image convolution (especially useful for large templates)
4.1 update Neon SIMD on MacOS version .so, super fast
-
optimizing the function GetNextMaxLoc () with struct s_BlockMax, for special cases whose template sizes are extremely smaller than source sizes, and for large TargetNumber.
It gets quite far.
Test case: Src10 (3648 X 3648) and Dst10 (54 X 54)
Effect: time consuming reduces from 534 ms to 100 ms. speed up 434%
![]() Inspection Image: 4024 X 3036 |
![]() Template Image: 762 X 521 |
| Library | Index | Score | Angle | PosX | PosY | Execution Time |
|---|---|---|---|---|---|---|
| My Tool | 0 | 1 | 0.046 | 1725.857 | 1045.433 | 76ms 🎖️ |
| My Tool | 1 | 0.998 | -119.979 | 2662.869 | 1537.446 | |
| My Tool | 2 | 0.991 | 120.150 | 1768.936 | 2098.494 | |
| Cognex | 0 | 1 | 0.030 | 1725.960 | 1045.470 | 125ms |
| Cognex | 1 | 0.989 | -119.960 | 2663.750 | 1538.040 | |
| Cognex | 2 | 0.983 | 120.090 | 1769.250 | 2099.410 | |
| Aisys | 0 | 1 | 0 | 1726.000 | 1045.500 | 202ms |
| Aisys | 1 | 0.990 | -119.935 | 2663.630 | 1539.060 | |
| Aisys | 2 | 0.979 | 120.000 | 1769.63 | 2099.780 |
note: if you want to get a best performance, please make sure you are using release verson (both this project and OpenCV dll). That's because O2-related settings significantly affects efficiency, and the difference of Debug and Release can up to 7 times for some cases.
# 1. Install dependencies automatically
./install_dependencies.sh
# 2. Build with one command
./build.sh
# 3. Run test (Linux/macOS)
cd build && ./MatchTool_test- Download Visual Studio 2017 or newer
- Check "x86 and x64 version of C++ MFC" during installation
- Open MatchTool.vcxproj
- Configure OpenCV paths in project properties
- Build and run
# Install dependencies
# Linux: sudo apt install libopencv-dev
# macOS: brew install opencv
# Build
mkdir build && cd build
cmake ..
make -j$(nproc)
# Test
./MatchTool_test- ✅ Windows: Full MFC GUI application
- ✅ Linux/macOS: Core library + command-line test
- ✅ Auto-install: Detects and installs OpenCV automatically
- ✅ Cross-platform: Same codebase, different outputs per platform
- ✅ SIMD Optimized: SSE2 acceleration for image processing
1.Select Debug_4.X or Release_4.X in "Solution Configuration"

2.Do step 10~12 in previous section
- Select the Language you want
- Drag Source Image to the Left Area
- Drag Dst Image to the Right Top Area
- Push "Execute Button"
- Target Number: possible max objects you want to find in the inspection image
- Max OverLap Ratio: (the overlap area between two findings) / area of golden sample
- Score (Similarity): accepted similarity of findings (0~1), lower score causes more execution time
- Tolerance Angle: possible rotation of targets in the inspection image (180 means search range is from -180~180), higher angle causes more execution time or you can push "↓" button to select 2 angle range
- Min Reduced Area: the min area of toppest level in image pyrimid (trainning stage)
- results are sorted by score (decreasing order)
- Angles: inspected rotation of findings
- PosX, PosY: pixel position of findings
contact information: dennisliu1993@gmail.com
- C++ shared library (.so) for python (Unix-ARM64, Ubuntu 22.04.02-ARM64, Linux)
- C++/MFC dll for .Net framework (Windows)
- pure C++ dll for Python (Windows)
- Template Matching using Fast Normalized Cross Correlation
- computers_and_electrical_engineering_an_accelerating_cpu_based_correlation-based_image_alignment
If you encounter an error(exception) on the constructor of opencv class "RotatedRect", modify the content in types.cpp:
this might due to Windows updates
RotatedRect::RotatedRect(const Point2f& _point1, const Point2f& _point2, const Point2f& _point3)
{
Point2f _center = 0.5f * (_point1 + _point3);
Vec2f vecs[2];
vecs[0] = Vec2f(_point1 - _point2);
vecs[1] = Vec2f(_point2 - _point3);
double x = std::max(norm(_point1), std::max(norm(_point2), norm(_point3)));
double a = std::min(norm(vecs[0]), norm(vecs[1]));
// check that given sides are perpendicular
// this is the line you need to modify
CV_Assert( std::fabs(vecs[0].ddot(vecs[1])) * a <= FLT_EPSILON * 9 * x * (norm(vecs[0]) * norm(vecs[1])) );
// wd_i stores which vector (0,1) or (1,2) will make the width
// One of them will definitely have slope within -1 to 1
int wd_i = 0;
if( std::fabs(vecs[1][1]) < std::fabs(vecs[1][0]) ) wd_i = 1;
int ht_i = (wd_i + 1) % 2;
float _angle = std::atan(vecs[wd_i][1] / vecs[wd_i][0]) * 180.0f / (float) CV_PI;
float _width = (float) norm(vecs[wd_i]);
float _height = (float) norm(vecs[ht_i]);
center = _center;
size = Size2f(_width, _height);
angle = _angle;
}modify threshold value of CV_Assert line to a bigger one
then recompile the source code













