Tutorials

Custom Documentation: Intelligent Search with BM25

Import your manufacturer manuals, datasheets, and technical docs. T-IA Connect indexes everything and lets you find the right information in seconds — powered by the same algorithm used by Google and Elasticsearch.

T
T-IA Connect Team
10 min read
Updated Mar 25, 2026

Stop Scrolling Through 500-Page PDFs

When you're configuring a SEW drive, a Siemens VFD, or integrating a third-party device, you often need to find one specific parameter buried in hundreds of pages of documentation. The traditional approach — open the PDF, Ctrl+F, scroll — is slow and unreliable.

T-IA Connect provides a dedicated folder where you can drop your technical documents. The software automatically indexes them using BM25 (Best Matching 25), the same full-text ranking algorithm used by search engines like Google and tools like Elasticsearch. You can then search across all your docs and get ranked results in milliseconds.

Prerequisites

  • T-IA Connect PRO license with Custom Docs enabled
  • Technical documents in PDF, DOCX, TXT, HTML or Markdown format
  • A project open in TIA Portal (optional, for AI-assisted queries)

Step 1: How Indexing Works

When you place a document in the custom docs folder, T-IA Connect processes it through a full indexing pipeline:

Architecture

// Pipeline

1. Document parsing — text extraction from PDF/DOCX/TXT/HTML/MD

2. Chunking — split into overlapping segments (~500 tokens each)

3. Tokenization — word splitting, stop word removal, stemming

4. BM25 index — each chunk is scored and stored for instant retrieval

// Supported: PDF, DOCX, TXT, HTML, Markdown

Step 2: BM25 — How the Search Engine Ranks Results

BM25 (Best Matching 25) is a ranking function used by search engines since the 1990s. Unlike a simple "contains" search, it calculates a relevance score for each chunk of text based on statistical analysis.

The algorithm considers three key factors:

TF — Term Frequency

A word that appears more often in a chunk makes it more relevant — but with diminishing returns. 10 occurrences is not 10× better than 1.

IDF — Inverse Doc Frequency

Rare words across the whole corpus score higher. "HAS" is worth more than "the" or "and".

Document Length

A match in a short chunk scores higher than the same match in a long one. This prevents long documents from dominating results.

BM25 Parameters

k1 = 1.2 // Term frequency saturation (higher = more weight to frequency)

b = 0.75 // Length normalization weight (0 = ignore length, 1 = full normalization)

Step 3: Beyond BM25 — Extra Intelligence

On top of the BM25 core, T-IA Connect adds several enhancements to improve search quality:

🛑

Stop Words Filtering

~150 common words in English, French, and German ("the", "le", "de", "und"...) are automatically ignored so the search focuses on meaningful terms.

🔤

Basic Stemming

Word variants are matched together. Searching for "alimentation" will also match "alimenté" and "alimenter", increasing recall without losing precision.

📍

Proximity Boost

When your search terms appear close together in the text, the relevance score is doubled (×2). This rewards exact phrase matches and adjacent concepts.

Step 4: Real-World Example

Imagine you have indexed the documentation for a SEW MOVITRAC LTP-B frequency inverter (350 pages). You search for:

Search Example

// User query:

"MOVITRAC LTP-B maximum output frequency parameter"

// BM25 result:

Chapter 8.3 — Parameter P100: Max Output Frequency [score: 12.4]

Chapter 5.1 — Frequency Range and Motor Settings [score: 8.7]

Chapter 12 — Parameter Reference Table [score: 6.2]

AI-Powered: Ask Questions in Natural Language

When combined with the AI Copilot, the BM25 search becomes even more powerful. Instead of searching for keywords, you can ask questions like "What is the maximum cable length for the MOVITRAC LTP-B?" — the AI retrieves the most relevant chunks via BM25, reads them, and gives you a precise answer with the source reference. This is called RAG (Retrieval-Augmented Generation).

Your Documentation, Instantly Searchable

Stop wasting time scrolling through PDFs. Import your docs, let BM25 index them, and find any information in seconds — whether through keyword search or AI-assisted natural language queries.