Multi-Model Architecture

The Engine
Behind Invoxa.

Our proprietary multi-model ensemble is not a single engine — it's a multi-layer consensus architecture designed for the highest possible accuracy on real-world, imperfect documents.

Model Stack

Four models. One answer.

Each document is processed simultaneously by four specialized engines. Their outputs are reconciled through weighted confidence voting.

Deep Learning Layer

PaddleOCR

Baidu's production-grade OCR system. Excels at multi-language detection and skewed/rotated document recovery. Primary extraction engine for high-fidelity character recognition.

Multi-Language Layer

EasyOCR

PyTorch-based architecture trained on 80+ languages. Provides strong coverage for handwritten text and low-contrast documents where PaddleOCR may struggle.

Classical Reference Layer

Tesseract 5

Google's LSTM-enhanced Tesseract serves as the classical baseline. Particularly accurate on clean, machine-printed documents. Used as a tiebreaker in the voting consensus.

Semantic Refinement Layer

LLM Correction Pass

After the three OCR engines vote, a large language model performs a context-aware correction sweep—fixing domain-specific terminology, numbers, and formatting artefacts.

Performance

Ensemble vs. Single Model

Why use four models when one is faster? Because accuracy compounds.

PaddleOCR (alone)
94%
EasyOCR (alone)
91%
Tesseract 5 (alone)
88%
Invoxa Multi-Model Ensemble
99%
Processing Pipeline

From bytes to insight.

1
Document Ingestion
PDF or image uploaded via browser or API. Automatic format normalization and resolution scaling applied.
2
Parallel Engine Dispatch
The document is dispatched simultaneously to PaddleOCR, EasyOCR, and Tesseract 5.
3
Confidence Voting
Results are aligned token by token. A weighted vote selects the highest-confidence output per token.
4
LLM Semantic Sweep
The combined draft is sent to the LLM correction layer for context-aware normalization.
5
Structured Export
Output delivered as raw text, formatted text, and structured JSON. Available via download or REST API.

Security Architecture

EncryptionAES-256 at rest, TLS 1.3 in transit
Auth ModelZero-trust session tokens, 15min TTL
Data IsolationPer-tenant namespace partitioning
Audit LoggingImmutable append-only log per extraction

Infrastructure

RuntimeFastAPI (Python 3.11) + Uvicorn
FrontendNext.js 15 (App Router) + TypeScript
DeploymentVercel Edge + Serverless API
Supported FormatsPDF, JPEG, PNG, TIFF, BMP, WebP

See the engine live.

Upload any document and watch all four models work in real time.