Multi-Model Architecture

The Engine
Behind Invoxa.

Our proprietary multi-model ensemble is not a single engine — it's a multi-layer consensus architecture designed for the highest possible accuracy on real-world, imperfect documents.

Model Stack

Four models. One answer.

Each document is processed simultaneously by four specialized engines. Their outputs are reconciled through weighted confidence voting.

Deep Learning Layer

PaddleOCR

Baidu's production-grade OCR system. Excels at multi-language detection and skewed/rotated document recovery. Primary extraction engine for high-fidelity character recognition.

Multi-Language Layer

EasyOCR

PyTorch-based architecture trained on 80+ languages. Provides strong coverage for handwritten text and low-contrast documents where PaddleOCR may struggle.

Classical Reference Layer

Tesseract 5

Google's LSTM-enhanced Tesseract serves as the classical baseline. Particularly accurate on clean, machine-printed documents. Used as a tiebreaker in the voting consensus.

Semantic Refinement Layer

LLM Correction Pass

After the three OCR engines vote, a large language model performs a context-aware correction sweep—fixing domain-specific terminology, numbers, and formatting artefacts.

Performance

Ensemble vs. Single Model

Why use four models when one is faster? Because accuracy compounds.

PaddleOCR (alone)

94%

EasyOCR (alone)

91%

Tesseract 5 (alone)

88%

Invoxa Multi-Model Ensemble

99%

Processing Pipeline

From bytes to insight.

Document Ingestion

PDF or image uploaded via browser or API. Automatic format normalization and resolution scaling applied.

Parallel Engine Dispatch

The document is dispatched simultaneously to PaddleOCR, EasyOCR, and Tesseract 5.

Confidence Voting

Results are aligned token by token. A weighted vote selects the highest-confidence output per token.

LLM Semantic Sweep

The combined draft is sent to the LLM correction layer for context-aware normalization.

Structured Export

Output delivered as raw text, formatted text, and structured JSON. Available via download or REST API.

Security Architecture

EncryptionAES-256 at rest, TLS 1.3 in transit

Auth ModelZero-trust session tokens, 15min TTL

Data IsolationPer-tenant namespace partitioning

Audit LoggingImmutable append-only log per extraction

Infrastructure

RuntimeFastAPI (Python 3.11) + Uvicorn

FrontendNext.js 15 (App Router) + TypeScript

DeploymentVercel Edge + Serverless API

Supported FormatsPDF, JPEG, PNG, TIFF, BMP, WebP

See the engine live.

Upload any document and watch all four models work in real time.

The EngineBehind Invoxa.