
Ontul Key Features
Discover the core features of the distributed data engine that unifies batch processing, stream processing, and interactive SQL in a single engine.
Unified Data Engine
Run batch processing, stream processing, and interactive SQL queries in a single cluster. Consolidate all data workloads without separate systems.
Arrow-Native Execution Engine
Process all data in Apache Arrow columnar format. Iceberg Parquet data files are decoded column-at-a-time directly into Arrow vectors, with Iceberg file pruning and Parquet row-group skipping to minimize bytes read; columnar aggregation and bounded-heap Top-N with zero-copy execution deliver best-in-class analytical performance.
Interactive SQL
JDBC connections (DBeaver, DataGrip) via Arrow Flight SQL with multi-catalog federation queries. Full standard SQL support including JOINs, window functions, and CTEs — plus a compiled-plan cache and a snapshot-keyed result cache that skips execution entirely for repeated reads over unchanged data, lifting interactive QPS for BI and AI-agent workloads. The MCP server services JSON-RPC batch requests concurrently, so an agent's multiple tool calls take one round-trip.
Flink-style Streaming
Continuous processing — events are processed as they arrive, not in Spark-style micro-batches. Supports TUMBLING, SLIDING, and SESSION windows with multi-worker hash shuffle.
Exchange Manager
Unified fault-tolerance infrastructure for Query, Batch, and Streaming. Handles data spill on memory pressure and streaming checkpoint state — all through a single system with KMS envelope encryption.
Exactly-Once Semantics
Master-coordinated barrier checkpoint guarantees exactly-once delivery for transactional sinks (Iceberg, JDBC, NeorunBase, Kafka Transactions). Sink commit before offset commit ensures data consistency.
Connector Architecture
Access diverse data sources through plugin-based connectors. Dynamically register and unregister Iceberg, NeorunBase, JDBC, Kafka, Elasticsearch, and more at runtime.
Federation Queries
Execute cross-catalog joins across multiple data sources in a single SQL query. Combine Iceberg, NeorunBase, and JDBC tables seamlessly.
Semantic Layer
Define metrics, dimensions, multilingual synonyms, governance, conformed-dimension joins, derived metrics, and multi-tenant mandatory filters once. Ontul rewrites SELECT revenue FROM sales into the full aggregation, JOIN, GROUP BY, and row filter server-side — clients never duplicate the formula.
Agentic AI Ready
Built-in MCP server gives LLM agents metric discovery, natural-language search (Korean 매출 ↔ revenue), and certification metadata. The semantic layer handles aggregation, JOINs, and RBAC server-side, so agents only need column names — multi-tenant policies follow the authenticated user automatically.
Native Apache Iceberg v2 & v3
Native support for both Iceberg v2 and v3 — distributed INSERT/CTAS plus merge-on-read DELETE/UPDATE/MERGE, hidden partitioning, schema evolution, time travel, branches and tags. On v3, deletes are written and read as deletion vectors (Puffin) instead of position-delete files. All operational capabilities in a single engine. Write-Audit-Publish (WAP) is supported too: stage INSERT/UPDATE/DELETE/MERGE on a non-main branch via SET, audit in isolation, then publish to main with ALTER TABLE EXECUTE fast_forward / cherrypick. It also ships Spark-style table-maintenance procedures (optimize, expire_snapshots, rewrite_manifests, remove_orphan_files, rollback) via ALTER TABLE EXECUTE, with fine-grained parameters such as retain_last, min_input_files, dry_run, and window_hours (window_hours does incremental compaction of just the last N hours of small files — ideal for streaming churn). From the Admin UI you can run per-table auto-maintenance with per-operation toggles, those parameters, and a CRON schedule.
Security (IAM & KMS)
AES-256-GCM envelope encryption, built-in KMS, Exchange Manager data encryption, catalog/table/column/row-level IAM policies, and STS temporary credentials.
BI Integration (Tableau · Power BI · Looker)
Tableau, Power BI, Looker, and DBeaver connect live via Arrow Flight SQL JDBC. Semantic views expose measures and dimensions with the right classification, and /api/v1/bi/connection-info returns driver coordinates plus per-tool setup hints in one call.
Semantic Layer — Single Source of Truth for Agentic AI
One definition of truth per metric — enforced server-side.
Ontul's semantic layer gives agents two things. ① Numbers (metrics) — define an analytics measure like revenue or margin once, and LLM agents, Tableau, and analysts all see the same number. ② Relevant context (retrievers) — multi-modal search that finds related documents by text or image. If a metric answers "how much revenue?", a retriever answers "find the related documents." Agents get both through one interface.
Core Capabilities
Server-Side Query Rewriting
SELECT revenue, customer.region FROM sales becomes SUM(amount * (1-discount)) with LEFT JOIN customer ON ... and GROUP BY customer.region — automatically. Clients never have to memorize the formula.
MCP-Native Metric Discovery
LLM agents use ontul_search_metrics and ontul_describe_semantic_view to find metrics across multilingual synonyms (매출 · revenue · net_revenue · sales_amount) and read their definitions. One definition is shared by every agent.
Derived Metrics
profit = revenue - cost, profit_margin = (revenue - cost) / revenue — define metrics in terms of other metrics. Ontul resolves them recursively at plan time, with cycle detection.
Conformed-Dimension Joins
Declare a JOIN once; Ontul injects it only when its columns are referenced. SELECT customer.region, revenue auto-adds LEFT JOIN customer ON ..., while unused joins stay out of the plan — declared joins cost nothing until used.
Multi-Tenant Mandatory Filters
Declare row-scoping predicates like tenant_id = ${user.attr.tenant_id} at the view or per-metric level. Substituted from the authenticated user context, so the same RLS policy applies whether the caller is a BI dashboard or an LLM agent.
Governance & RBAC
Per-metric allowedRoles for access control, DRAFT → CERTIFIED → DEPRECATED lifecycle, certifier audit, free-form tags. Enforced at rewrite time — unauthorized users never see the formula in error messages.
Retrievers — Multi-Modal Search in One Call
The search object an agent uses to find "what's related" by text or image. It runs vector (meaning), keyword (BM25), and graph (relationships) together on NeorunBase, protected by the same IAM and permissions as metrics. The agent writes no SQL — it just fills in the values, and that's RAG. (HYBRID_SEARCH / GRAPH_NEIGHBORS defined as governed retriever objects, pushed down through Ontul.)
One line is enough
SELECT customer.region, profit_margin
FROM saas.core.sales
WHERE ship_date >= DATE '2024-01-01';SELECT customer.region,
(SUM(amount) - SUM(unit_cost * quantity)) / SUM(amount)
AS profit_margin
FROM saas.core.sales
LEFT JOIN saas.core.customer customer
ON sales.customer_id = customer.id
WHERE ship_date >= DATE '2024-01-01'
AND tenant_id = 'acme-co' -- auto RLS
AND status = 'COMPLETED' -- per-metric filter
GROUP BY customer.region;What this means for Agentic AI
No Hallucinated Metrics
Formulas live once, server-side. Even if an LLM guesses AVG instead of SUM — as long as the metric name is right, the correct aggregation runs every time.
IAM Auto-Propagation
The metrics and rows an agent can see are exactly what the user's IAM policy allows. No prompt-level permission logic, no bypass.
Multilingual by Default
"매출 어떻게 돼?" finds the revenue metric via synonym matching. Business terminology varies by team — the semantic layer bridges that gap.
BI · AI Consistency
The revenue Tableau shows and the revenue an LLM agent answers are computed by the same SQL. The two channels never disagree on the number.
Use Cases
Unified Data Processing
Handle all data workloads — batch, streaming, and SQL — with a single Ontul cluster instead of separate systems.
AI Agent Analytics
LLM agents discover metrics through MCP tools and translate natural-language questions into Ontul SQL. The semantic layer handles aggregation, joins, and IAM — so agents answer with certified business definitions, not hallucinated formulas.
Real-Time Data Pipelines
Ingest data from Kafka, process in Ontul, and load into Iceberg tables for real-time ETL pipelines.
Data Lake Analytics
Run federation queries across Iceberg, JDBC, and other sources for unified analytics.
Analytics + RAG in One Backend
Run metrics (analytics) and retrievers (multi-modal search) on one engine under one governance — no separate semantic-analytics tool and vector/graph search stack to operate. An agent pulls "the numbers" and "the supporting context" together in a single MCP session.
Considering Ontul for your data platform?
Unified. Arrow-Native. Agentic AI-Ready.
A distributed data engine that unifies batch, streaming, SQL, and a production semantic layer — so BI dashboards and AI agents answer with the same truth.
