From Visitor ID to Device Identity: The Architecture of LRDefender

Most fingerprinting products hand you a visitor ID and leave the rest to your imagination. LRDefender is a full device intelligence platform — signal collection, cross-browser matching, bot detection, threat enrichment, and tenant-scoped analytics — designed for production fraud and security teams.

This post walks through the end-to-end architecture: what happens from the moment the SDK loads on a page to the moment a risk score appears in your dashboard.

System overview

LRDefender spans four products in a single monorepo:

| Component | Role | |---|---| | Browser SDK (packages/sdk) | Collects 90+ signals in under 200ms | | API backend (apps/api) | Normalizes, matches, persists, and analyzes fingerprints | | Web dashboard (apps/web) | Tenant analytics, threat monitoring, and configuration | | VPN API (apps/vpn-api) | Network intelligence for proxy, VPN, and TOR detection |

The core pipeline follows four contract-asserted stages: normalize → match → persist → analyze. Each stage has defined inputs, outputs, and failure modes. If a stage cannot complete, the pipeline degrades explicitly rather than silently returning garbage.

Stage 0: Signal collection (SDK)

The LR Defender SDK loads via a thin collector package or direct script inclusion. On initialization, it gathers signals across six categories:

Hardware probes — WebGL GPU task rendering, WebGPU compute timing, AudioContext oscillator signatures, canvas rendering, and screen/display characteristics.

Environment signals — navigator properties, installed fonts, timezone, locale, plugin enumeration, and media device lists.

Behavioral biometrics — mouse kinematics, scroll patterns, keystroke timing, and touch pressure (mobile).

Bot detection — headless browser markers, lies detection (API consistency checks), automation framework artifacts, and developer tools detection.

Privacy and integrity — incognito detection, tampering analysis, anti-detect browser identification, and virtual machine markers.

Consent management — configurable signal gating based on user consent state.

The SDK serializes collected signals into an encrypted payload and transmits it to the API via the fingerprint ingest endpoint. Total collection time targets under 200ms on median hardware.

Stage 1: Normalize

The API receives the raw payload and normalizes it into typed feature vectors:

Raw canvas pixel data becomes a quantized hash.
WebGL task outputs are stripped of browser-specific noise using trained masks (inspired by NDSS 2017 cross-browser fingerprinting).
Font lists are normalized against per-browser trained masks.
Behavioral signals are typed, scaled, and mapped to a fixed feature schema.
Each signal receives a stability weight based on historical drift measurements.

Normalization is deterministic — the same raw input always produces the same normalized output. This is critical for matching: if normalization were lossy or random, cross-session comparison would be meaningless.

Stage 2: Match

The matcher compares normalized signals against stored device records for the tenant. LRDefender uses a heuristic similarity engine (not a trained neural network by default) that:

1. Computes weighted cosine similarity across signal vectors. 2. Applies per-signal stability weights (GPU tasks weighted higher than canvas). 3. Compares against candidate devices in a three-tier cache: local LRU → Redis → MSSQL. 4. Returns a match confidence score and the best-matching device ID (if above threshold).

Cross-browser matching is the differentiator. The matcher does not require identical hashes — it requires sufficient similarity across hardware-anchor signals. Chrome and Firefox on the same MacBook typically score above the linkage threshold because WebGL task outputs and AudioContext signatures correlate.

When ML_MODEL_ENABLED is set, an enhanced attention-based neural model augments heuristic scoring for borderline matches.

Distributed match locks (Redis-backed) prevent race conditions when multiple API pods process concurrent requests for the same device.

Stage 3: Persist

Matched or newly created device records are persisted through the three-tier cache:

1. Local LRU — in-process cache for hot devices (microsecond access). 2. Redis — distributed cache shared across API pods (millisecond access). 3. MSSQL — durable storage with tenant-scoped queries (millisecond-to-second access).

Every database query includes a TenantId WHERE clause — tenant isolation is enforced at the repository layer, not left to application logic.

Device records store: stable device ID, signal vectors (hashed), first-seen and last-seen timestamps, match confidence history, and linkage graph edges for cross-browser connections.

Stage 4: Analyze

After matching and persistence, the analyze stage enriches the device record with intelligence:

Smart signals — bot detection score, VPN/proxy/TOR detection (via LR Shield), tampering analysis, incognito status, developer tools detection, virtual machine markers, and federated threat context.

Behavioral analysis — session-level behavioral biometrics compared against stored profiles. Anomaly detection flags sessions where mouse kinematics or interaction patterns diverge from the device's historical baseline.

Threat enrichment — suspect score computation (composite risk from all signals), fraud scoring, anomaly detection, and federated threat intelligence lookups.

Adaptive scoring — per-tenant ML models (when trained) adjust suspect score weights based on labeled fraud data.

The analyze stage output is returned synchronously in the fingerprint response and persisted asynchronously for dashboard analytics.

Data flows to the dashboard

Tenant administrators interact with device intelligence through the Next.js dashboard:

Devices — searchable device registry with signal detail, block controls, and cross-browser linkage graph.
Smart Signals — per-device signal cards showing bot, VPN, tampering, and federated threat status.
Federated Intel — anonymized threat signals from the cross-tenant network.
Bot Detections — historical bot scoring events with filterable severity.
Threat Logs — VPN, proxy, TOR, and malware detections.
Analytics — device, browser, behavioral, and security trend dashboards.

All dashboard API calls proxy through the Next.js BFF layer to tenant-scoped backend endpoints. Session cookies authenticate requests; API keys never reach the browser.

Deployment topology

Production LRDefender runs on AWS us-east-1:

API servers on EC2 (c7i-flex.large) behind nginx with Let's Encrypt TLS.
Web dashboard on EC2 (t3.small) with PM2 process management.
MSSQL and Redis on a private Tailscale network.
SDK artifacts distributed via CDN (CloudFront).

CI/CD triggers on push to main. Deploy workflows use Ansible for configuration management.

Design principles

Several architectural decisions define LRDefender's behavior:

1. Contract-asserted pipeline stages — each stage validates its inputs and outputs. Failures are explicit, not silent. 2. Similarity over equality — device matching uses weighted vector comparison, not exact hash matching. 3. Tenant isolation everywhere — every query, cache key, and webhook is tenant-scoped. 4. Graceful degradation — if Redis is unavailable, the system falls back to MSSQL. If the database is unavailable in production, readiness checks fail loudly. 5. Signal stability as a first-class concern — probes are weighted by drift rate, not just entropy.