Background
NeorunBase

NeorunBaseOLTP · Vector DB · Full-Text · Graph DB in One SQL Engine

NeorunBase Key Features

A PostgreSQL-compatible multi-modal Lakebase that handles relational, JSONB, vector database, full-text search, graph database, and spatial data in a single distributed ACID engine. No separate Pinecone, Elasticsearch, or Neo4j — all signals combine in one SQL statement, with multi-site DR, silent-fail safety, and topology-aware placement built in for enterprise operations.

PostgreSQL Wire Protocol

psql, JDBC, pgAdmin, LangChain, and pgvector clients connect with zero code changes. Move from PostgreSQL to a distributed engine without rewriting applications.

Distributed ACID + Self-Healing

Hash-based sharding with online rebalancing, ZooKeeper sticky leader election, automatic shard / disk recovery, and 2PC distributed transactions — all on a RocksDB-backed ACID core.

Multi-Modal in One SQL

Combine relational, vector, full-text, graph, and spatial data inside a single ACID SQL statement. Hard Filter (WHERE) + semantic/keyword hybrid retrieval + graph-aware re-ranking expressed in one SELECT — no separate stores, no glue ETL.

Vector Database (pgvector compatible)

pgvector-compatible VECTOR type with distance operators (<->, <#>, <=>) and distributed HNSW ANN indexes. Rows, metadata, and embeddings live in the same ACID transaction — ready as a RAG retrieval backend out of the box.

Full-Text (Lucene BM25)

Lucene-backed BM25 inverted index with PostgreSQL-compatible FTS syntax (@@, ts_rank) and multi-language tokenization. Hybrid retrieval blending BM25 + vector ANN is a single HYBRID_SEARCH(...) TVF call.

Graph Database + Analytics

Multi-hop BFS (GRAPH_NEIGHBORS), PageRank · Personalized PageRank, and reachability fact-checks (GRAPH_PATH_EXISTS) exposed as SQL TVFs over edge tables. A CSR acceleration layer cuts single-hop latency by 100×+, and a single SELECT can re-rank hybrid-search results by graph proximity — the Graph RAG pattern in one query.

Geospatial (PostGIS compatible)

PostGIS-compatible spatial functions (ST_Distance, ST_Contains, …) with Z-order spatial indexing. Location-based services, GIS, and geo-aware RAG, all from one engine.

Iceberg CDC + Kafka Ingest

OLTP table changes auto-sync to Iceberg/Parquet — Ontul reads the same data via the Iceberg catalog, no separate ETL. Reads both Iceberg v2 and v3 (deletion vectors), with direct Kafka consumer integration plus REST bulk insert and MERGE / upsert. With Write-Audit-Publish (WAP), CTAS, MERGE, and CDC sync can land on a non-main branch for isolated audit, then publish to main via REST (/admin/api/iceberg/wap/publish).

Iceberg Serving (LakeBase) — PK point/range index

Serve the open lakehouse (Iceberg) at low latency and high QPS without copying data. A primary-key predicate (= / IN / BETWEEN / range comparison) is answered by a per-snapshot pk → (file, row) index — reading only the matching rows instead of scanning files — at thousands of QPS and 1-2 ms, faster than a native RocksDB read for point lookups. The index is automatic (no DDL), snapshot-invalidated, persisted to local RocksDB (restored on restart), and append-incremental. Declare secondary indexes on non-PK columns with CREATE INDEX, and scope auto-indexing across thousands of tables with per-catalog index.patterns. External applications and dashboards connect directly over the PostgreSQL wire — a true serving layer.

Enterprise Security & Admin Console

AES-256-GCM envelope encryption (data, WAL, metadata), built-in KMS with key rotation, IAM/RBAC + STS, TLS for the pgwire endpoint. Column masking, column deny, and row filters are enforced on the Iceberg serving path too. Alongside ontul, IAM federation makes ontul the policy authority while NeorunBase pulls and enforces it for direct (JDBC) access, with ontul-token SSO. A React-based console + Prometheus metrics covers cluster, shards, IAM, Iceberg, and Kafka in one place.

Multi-Site Disaster Recovery (Site Replication)

Every change on the primary — INSERT, UPDATE, DELETE, even CREATE TABLE — streams to the standby site over WAL-level asynchronous replication. Vector (HNSW) and FTS (Lucene) indexes auto-reconcile. Run a minutes-RPO DR site without Debezium or Golden Gate.

NIC Silent-Fail Safety

Shuts down the trickiest distributed-systems failure mode — "ZooKeeper says alive but the NIC is dead." Selector-driven non-blocking connect / read / write with 10-second deadlines and SO_KEEPALIVE — one sick node can never wedge the whole cluster.

Topology-Aware Placement & Auto-Recovery

A 6-pass cascade spreads shard replicas across Zone → Rack → Host → Node so a rack or zone outage loses at most one copy. Reactive Shard Repair scans every 60 s and re-replicates diverging replicas automatically — self-healing without operator intervention.

Use Cases

Graph RAG & Agent Retrieval

Hard Filter (SQL WHERE) + semantic/keyword hybrid retrieval + graph-based re-ranking + fact-check, all in a single SELECT. Skip the four-system stack (Postgres + Pinecone + Elasticsearch + Neo4j) and run RAG / agent retrieval from one database.

Knowledge Graph & Ontology Store

Model entities and relationships as ordinary tables; traverse with BFS, score with PageRank / Personalized PageRank. RBAC, transactions, and backup come for free — no separate graph DB to operate.

Unified OLTP & Analytics

OLTP transactions auto-sync to Iceberg via CDC, and Ontul / Trino / Spark analyse the same data. No bespoke ETL or CDC pipeline to maintain.

Large-Scale Distributed DB & Real-Time Ingest

Horizontal sharding with automatic rebalancing plus direct Kafka ingestion handles petabyte-scale transactional workloads.

Location-Aware & Geo RAG

Combine PostGIS-compatible spatial functions with vector embeddings for location-based recommendation, GIS, and geo-aware retrieval — all in one engine.

Considering NeorunBase for your data platform?

OLTP · Vector DB · FTS · Graph DB · Spatial · DR — One ACID SQL Engine.

Multi-modal data — vector DB, graph DB, and multi-site DR — in one PostgreSQL-compatible engine.