Hardened Pattern Matcher

Semantic Similarity + Adversarial Attack Detection

Overview

The Hardened Pattern Matcher extends the traditional regex-based validation with semantic similarity embeddings to prevent adversarial attacks that bypass simple pattern matching.

The Problem

Traditional regex patterns can be easily bypassed with:

Spacing attacks: d i a g n o s e instead of diagnose
Special character insertion: d!i@a#g$n%o^s&e instead of diagnose
Misspellings: diagnoz or tratment instead of diagnose or treatment
Leetspeak: d3pr3ss10n instead of depression

These attacks can slip past regex patterns while remaining semantically identical to forbidden medical advice.

The Solution

The hardened pattern matcher uses a two-stage validation approach:

Stage 1: Fast Regex Patterns (under 10ms)

Quick keyword and pattern matching to catch obvious violations.

Stage 2: Semantic Similarity (100-300ms)

Uses embedding vectors to compare text against a database of forbidden medical concepts, catching obfuscated attacks.

import { createValidator } from '@the-governor-hq/constitution-core';
 
const validator = createValidator({
  domain: 'wearables',
  useSemanticSimilarity: true,  // Enable hardened checks
  semanticThreshold: 0.75,       // Similarity threshold (0-1)
});
 
const result = await validator.validate('You have d i a g n o s e d insomnia');
// ✗ Blocked - semantic similarity detected medical diagnosis despite spacing

How It Works

Text Normalization

Before checking patterns, text is normalized to remove obfuscation:

import { normalizeText } from '@the-governor-hq/constitution-core';
 
normalizeText('d!i@a#g$n%o^s&e');  // → 'diagnose'
normalizeText('d i a g n o s e');   // → 'diagnose'
normalizeText('diagnoz');           // → 'diagnose'

Normalization steps:

Remove special characters
Collapse spacing
Convert to lowercase
Fix common misspellings

Adversarial Attack Detection

Detects when text has been manipulated. Since v3.3.3, adversarial detection is a signal (metadata + confidence penalty), not an automatic critical violation. It only escalates to a violation when the manipulation correlates with a forbidden pattern/semantic hit — i.e., the obfuscation was hiding something.

import { detectAdversarialAttack } from '@the-governor-hq/constitution-core';
 
const result = detectAdversarialAttack('You have d i a g n o s e d insomnia');
 
console.log(result);
// {
//   normalized: 'you have diagnosed insomnia',
//   manipulationDetected: true,
//   manipulationType: 'spacing',
//   confidencePenalty: 0.15
// }

Attack types detected (with confidence penalties):

spacing - Extra spaces between characters (penalty: 0.15)
special-chars - Special characters inserted (penalty: 0.12)
misspelling - Intentional misspellings (penalty: 0.05)

Note: Benign text that triggers normalization diffs (emoji, symbols, formatting, copy/paste artefacts) will record a signal and apply a small confidence penalty, but will not generate a violation unless a forbidden hit is also found in the normalized form.

Semantic Similarity Matching

Compares text embeddings against a vector database of forbidden medical concepts:

import { checkSemanticSimilarity } from '@the-governor-hq/constitution-core';
 
const result = await checkSemanticSimilarity('Take 5mg mel@tonin');
 
console.log(result);
// {
//   violations: [
//     {
//       concept: 'medication-dosage',
//       category: 'treatment',
//       severity: 'critical',
//       similarity: 0.89,
//       example: 'Take 5mg of supplement before bed'
//     }
//   ],
//   maxSimilarity: 0.89,
//   latencyMs: 156
// }

Vector Database

The system maintains a database of forbidden medical concept embeddings:

Forbidden Concepts

Category	Examples
Medical Diagnosis	"You have insomnia", "This indicates sleep apnea"
Treatment Prescription	"Take melatonin for sleep", "5mg dosage recommended"
Medical Scope	"Symptoms indicate a condition", "Clinical markers show problems"
Emergency/Alarming	"Medical emergency", "Serious health danger"
Prescriptive Commands	"You must see a doctor", "You need medication"

Each concept has a pre-computed embedding vector for fast similarity comparison.

Configuration

Enable Semantic Similarity

const validator = createValidator({
  domain: 'wearables',
  useSemanticSimilarity: true,
  semanticThreshold: 0.75,  // 0.75 = 75% similarity required to flag
});

Threshold Guidelines

0.90+: Very strict - only exact matches
0.75-0.89: Recommended - catches paraphrases
0.60-0.74: Permissive - may have false positives
Below 0.60: Not recommended - too many false positives

Performance

Latency Comparison

Validation Type	Latency	Use Case
Regex only	under 10ms	Fast real-time checks
+ Semantic similarity (first use)	2-5s	Model initialization
+ Semantic similarity (cached)	100-300ms	Production use

Optimization Tips

Initialize once at startup:

import { initializeVectorDatabase } from '@the-governor-hq/constitution-core';
 
// During app startup
await initializeVectorDatabase();

Async initialization is automatic:

When creating a validator with useSemanticSimilarity: true, initialization starts in the background. The first validate() call will automatically wait for initialization to complete:

// Initialization starts in background
const validator = createValidator({ useSemanticSimilarity: true });
 
// This waits for initialization if needed
const result = await validator.validate(text);

Use regex as first pass:

Regex patterns catch most violations quickly. Semantic similarity only runs when enabled.

Batch processing:

import { batchCheckSemantic } from '@the-governor-hq/constitution-core';
 
const results = await batchCheckSemantic([
  'Text 1',
  'Text 2',
  'Text 3'
]);

Real-World Attack Examples

The hardened matcher catches these real-world obfuscation attempts:

// ❌ All blocked by semantic similarity
 
await validator.validate('You have d i a g n o s e d depression');
// Spacing attack
 
await validator.validate('Take mel@tonin 5mg before bed');
// Special character + medical dosage
 
await validator.validate('You have diagnoz of anxeity');
// Misspellings
 
await validator.validate('T A K E  s u p p l e m e n t s');
// Spacing attack on prescription language
 
await validator.validate('you have i-n-s-o-m-n-i-a');
// Hyphen spacing attack

Testing

Test Adversarial Attacks

import { detectAdversarialAttack, normalizeText } from '@the-governor-hq/constitution-core';
 
describe('Adversarial Attack Tests', () => {
  test('Detects spacing attacks with confidence penalty', () => {
    const result = detectAdversarialAttack('d i a g n o s e');
    expect(result.manipulationDetected).toBe(true);
    expect(result.manipulationType).toBe('spacing');
    expect(result.confidencePenalty).toBe(0.15);
  });
  
  test('Normalizes obfuscated text', () => {
    const normalized = normalizeText('d!i@a#g$n%o^s&e');
    expect(normalized).toBe('diagnose');
  });
 
  test('Signal only — no auto-violation for benign text', async () => {
    const validator = createValidator({ useSemanticSimilarity: false });
    // Emoji/symbol text triggers normalization but contains no forbidden content
    const result = validator.validateSync('Great job today! 💪🔥');
    expect(result.safe).toBe(true);
    expect(result.metadata.adversarialSignal?.detected).toBeDefined();
  });
});

Test Semantic Similarity

import { checkSemanticSimilarity } from '@the-governor-hq/constitution-core';
 
describe('Semantic Similarity Tests', () => {
  test('Catches obfuscated medical advice', async () => {
    const result = await checkSemanticSimilarity('Take m e l a t o n i n');
    expect(result.violations.length).toBeGreaterThan(0);
  });
});

Run Tests

# Test adversarial detection
node tests/adversarial-attacks.test.js
 
# Test semantic similarity (requires model download)
node tests/semantic-similarity.test.js

Migration Guide

From Regex-Only to Hardened

Before:

const validator = createValidator({ domain: 'wearables' });
const result = validator.validateSync(text);  // Regex only

After:

const validator = createValidator({
  domain: 'wearables',
  useSemanticSimilarity: true,  // Enable hardened checks
});
const result = await validator.validate(text);  // Note: async now

Breaking changes:

Semantic validation is async (returns Promise)
First initialization downloads embedding model (~80MB)
Slightly higher latency (100-300ms vs under 10ms)

Advanced Usage

Custom Forbidden Concepts

Add your own forbidden concepts to the vector database:

import { FORBIDDEN_MEDICAL_CONCEPTS, generateEmbedding } from '@the-governor-hq/constitution-core';
 
// Add custom forbidden concept
FORBIDDEN_MEDICAL_CONCEPTS.push({
  concept: 'custom-violation',
  category: 'diagnosis',
  severity: 'critical',
  example: 'Your specific forbidden phrase',
  embedding: await generateEmbedding('Your specific forbidden phrase'),
});

Direct Similarity Calculation

import { generateEmbedding, cosineSimilarity } from '@the-governor-hq/constitution-core';
 
const emb1 = await generateEmbedding('You have insomnia');
const emb2 = await generateEmbedding('You have sleep problems');
 
const similarity = cosineSimilarity(emb1, emb2);
console.log(similarity);  // e.g., 0.83

FAQ

Q: Does this slow down validation significantly?

A: First initialization takes 2-5 seconds to download the model. After that, validation adds ~100-300ms overhead. Use regex-only mode for real-time use cases.

Q: Can users still bypass this?

A: While harder, sophisticated attacks (e.g., complete paraphrasing) may still slip through. Semantic similarity is one layer of defense. Combine with:

LLM judge for ambiguous cases
User reporting mechanisms
Regular pattern updates

Q: How much storage does the model require?

A: The embedding model is ~80MB. It's cached locally after first download.

Q: Can I use this offline?

A: Yes, after the first download. The model runs locally via @xenova/transformers (ONNX runtime).

Next Steps

Runtime Validation - See the full validation pipeline
Validators - Understand all validation layers
Testing Guide - Comprehensive testing strategies

Validators Middleware