
Kowalski
Market Intelligence & Structural Monitoring Engine
Kowalski is the proprietary data intelligence system developed by RUC on Rails to acquire, structure and monitor publicly available information across the RUC ecosystem.
It powers the research, comparison and monitoring infrastructure behind RUC Hub and RUC Compare, transforming fragmented provider information into structured, queryable datasets.
End-to-End Data Pipeline
From source discovery through to structured API delivery, every stage is purpose-built for the complexities of RUC market data.
Engineered for the RUC Ecosystem
Generic scraping tools struggle in environments where data formats are inconsistent and unpredictable. Kowalski was engineered specifically to handle these conditions.
Variable Pricing Tables
Pricing structures differ across providers with no standard schema.
Shifting Formats
Layouts and page structures change without notice or versioning.
Embedded Documents
Key disclosures appear as embedded PDFs requiring specialised parsing.
Irregular Updates
Disclosures are updated at irregular intervals across providers.
The system maintains structured source profiles for each monitored provider, allowing extraction logic and validation rules to be tuned to real-world RUC publishing patterns rather than relying on fragile one-size-fits-all rules.
Source Discovery & Structural Mapping
Kowalski maintains a continuously updated structural model of monitored sources. This structural awareness enables efficient crawling, minimises unnecessary requests and improves resilience when layouts evolve.
Proprietary Extraction Framework
Rather than relying solely on static selectors, the system evaluates structural context before applying extraction strategies. Extraction workflows are versioned and source-aware, enabling continuous refinement.
Deterministic Rules
Pattern-matched extraction for stable, well-structured content with consistent layouts.
- Stable selectors
- Fixed schemas
- Consistent formats
Pattern-Based Logic
Flexible heuristics for semi-structured data where layouts shift but retain identifiable patterns.
- Table recognition
- Structural hints
- Layout analysis
AI-Assisted Interpretation
Machine learning models for ambiguous tables, PDF documents and complex pricing structures.
- PDF parsing
- Ambiguous tables
- Natural language
Validation & Quality Controls
Structured data passes through layered validation controls before appearing in public-facing tools. Where outputs fall outside expected tolerances, records are withheld or queued for review.
Verified against defined RUC data models
Pricing values checked for expected tolerances
Related fields validated for logical consistency
Structural outliers flagged for manual review
Non-conforming records withheld before publication
Distributed Processing & Orchestration
Kowalski operates through coordinated processing services designed for horizontal scaling as monitored sources expand.
Scheduled Crawling
Automated refresh cycles tuned to each provider's update frequency.
Parallel Ingestion
Concurrent processing across multiple providers with isolation guarantees.
Rate Management
Source-aware pacing and intelligent rate limiting to respect external systems.
Fault Isolation
Retry logic and error containment prevent cascading failures across services.
Operational Logging
Full traceability and monitoring across all acquisition and processing stages.
Horizontal Scaling
Architecture supports growth in monitored sources without re-architecture.
Change Detection & Revision Tracking
Kowalski continuously monitors tracked surfaces for meaningful change. Change detection combines content comparison, structural awareness and document version tracking.
Revision history is preserved where available, supporting temporal comparison rather than simple snapshot replacement.
Structured Storage & Data Models
Extracted information is normalised into standardised data models designed specifically for RUC market comparison. Each record retains source provenance and acquisition metadata for auditability.
Cross-Provider Comparison
Standardised schemas enable direct pricing and service comparisons.
Provider Profiles
Comprehensive structured profiles generated from aggregated public data.
Revision History
Temporal tracking enables historical comparison and trend analysis.
Search & Filter
Structured datasets power advanced search and filtering across tools.
API Delivery
Structured data available via API for platform integration where applicable.
Source Provenance
Every record retains full lineage back to its originating public source.
Provenance & Audit Controls
Every published data point can be traced back to its originating public source through end-to-end traceability.
How We Operate
Kowalski is designed to comply with applicable New Zealand legislation. Responsible operation is built into the architecture, not bolted on after the fact.
Public Data Only
The system engages only with publicly accessible commercial information. No authentication, no personal data, no circumvention of access controls.
Respectful Access
Rate limiting, request pacing and automated backoff controls are built into every ingestion workflow. We assess and respect applicable access terms as part of our source approval process.
Facts, Not Content
Kowalski extracts factual commercial data: pricing figures, fee schedules, service parameters. It does not reproduce editorial or copyrighted content.
Data Minimisation
Only information relevant to defined analytical objectives is collected. Minimisation principles are applied throughout the acquisition pipeline.
Source Governance
Every monitored source goes through an internal approval process covering permissible data classes, monitoring frequency and publication review.
Default to Caution
Where any uncertainty exists regarding a source or data class, engagement is withheld until the position is clear. We err on the side of not acting.
The Intelligence Backbone of RUC on Rails
By combining purpose-built extraction logic, validation controls and continuous monitoring within a dedicated RUC-focused architecture, Kowalski converts fragmented public information into reliable, structured market visibility.