A high-performance text search engine implemented in C++ using advanced data structures.
- Inverted Index & Trie: Efficiently maps words to documents.
- TF-IDF Ranking: Ranks search results by relevance.
- Stopword Removal: Filters out common words (e.g., "the", "is") for better accuracy.
- Multi-word Support: Handles queries with multiple terms.
- Modular Design: Clean separation of concerns (Indexer, Tokenizer, Storage).
- Trie: Stores the vocabulary. Each leaf node allows O(L) storage and retrieval of internal
WordIDs. - Inverted Index: Maps
WordID->List of {DocID, Frequency}. - Hash Map (DocumentStore): Maps
DocID->Metadata(File Path, Total Word Count).
- C++17 compliant compiler (GCC, Clang, MSVC)
- CMake (3.10+)
mkdir build
cd build
cmake ..
cmake --build .Ensure the data directory is in the same folder as the executable or in the parent directory.
./MiniSearchEngine
# On Windows
MiniSearchEngine.exe- Place text files to be indexed in the
data/folder. - Run the application.
- Enter queries when prompted.
- Example:
search engine - Example:
fast performance
- Example:
- Type
exitto quit.
src/: Source code (.cpp)include/: Header files (.h)data/: Sample datasets and stopwords