Chango is True Unified Data Lakehouse Platform to solve the problems which occur in your data area.

True Unified Data Lakehouse Platform

Chango provides popular open source engines like Spark, Trino, Kafka, and Iceberg as lakehouse table format, and Chango specific components.

Chango is true unified data lakehouse platform with the support of most features which are necessary to build your data lakehouse.

Storage Security

Storage Security is a first-class mandatory in modern data lakehouses. Chango provides fine-grained data access control using RBAC to Chango storage. All data accesses are controlled in the fine-grained manner like catalog, schema and table level.

Data Catalog

Chango REST Catalog is iceberg REST Catalog used as data catalog in Chango.

Security-first Data Catalog

Storage Security is a first-class mandatory in modern data lakehouses. Chango REST Catalog works with Chango Authorizer tightly which controls all the data access with strong storage security of catalog, schema and table level in Chango. That is, multiple iceberg supported engines like sparktrino can work with Chango REST Catalog seamlessly with the support of strong storage security to iceberg in Chango.

Automatic Iceberg Table Maintenance

Everytime data committed to iceberg tables, many files will be created like data files, snapshots, metadata files which should be maintained manually later. Chango REST Catalog maintains iceberg tables automatically for you.  Chango REST Catalog does the followings for you automatically.

  • Compacts small files.
  • Expires snapshots.
  • Remove old metadata files.
  • Remove orphan files.
  • Rewrite manifest files.
  • Rewrite position delete files

Load Files to Iceberg Tables using Chango SQL Procedure

Chango SQL Procedure is an easy way to load external files like CSVJSONParquet and ORC located in s3 compatible object storage to iceberg tables in Chango without the need of additional development of spark jobs.

PROC iceberg.system.import (    
    source => 's3a://any-bucket/any-path',    
    s3_access_key => 'any access key',    
    s3_secret_key => 'any secret key',    
    s3_endpoint => 'any endpoint',    
    s3_region => 'any region',    
    file_format => 'json',    
    id_columns => 'id_1, id_2',    
    action => 'MERGE',    
    target_table => 'iceberg.test_db.test'
)

Streaming Ingestion

If you want to insert streaming events like user behavior events, logs, IoT events to iceberg tables, you need to build event streaming platform like kafka and write streaming jobs like spark streaming jobs in most cases. But in Chango, you don’t have to do so. Streaming application can ingest streaming events to iceberg tables through REST API in chango directly without the need of additional streaming platform and streaming jobs.

  • Just send streaming events through REST simply, and the rest of the work to ingest streaming events to iceberg tables will be done automatically.
  • No need to build event streaming platform and develop streaming jobs.
  • Small data files created everytime streaming events are ingested to iceberg tables will be compacted automatically.
  • Iceberg table maintenance like snapshot expiration and old metadata files removal will be done automatically.

Aggregate Logs

Chango Log is a log agent to read local log files and send logs to iceberg tables in Chango to analyze logs. Using Chango Log, you can analyze logs from all your distributed logs joining different databases in richer manner realtimely in Chango.

Change Data Capture

Chango CDC is Change Data Capture application to catch CDC data of database and send CDC data to iceberg tables in Chango. You don’t need such as Kafka and Kafka Connect cluster to accomplish CDC.

Trino Gateway

Chango Trino Gateway is an implementation of trino gateway concept. Chango has the concept of trino gateway which routes trino queries to upstream backend trino clusters dynamically. If one of the backend trino clusters has exhausted, then trino gateway will route queries to the trino cluster which is executing less requested queries. Trino does not support HA because trino coordinator has single point failure. In order to support HA of trino, we need to use trino gateway.

Data Transformation

Chango Query Exec is a REST application to execute trino ETL queries to transform data in Chango. You just send trino ETL queries to Chango Query Exec through REST using such as curl. Chango Query Exec has several advantages, for example.

  • Send trino ETL queries via REST simply without additional tool and library installation.
  • Just use the same trino ETL queries which you already used to explore data with your BI tools.
  • Easy way to integrate with workflow engine.
© 2024 Cloud Chef Labs, Inc. All rights reserved.