Contact
Address
Nonhyeonro 509, C707, Gangamgu, Seoul 06132, South Korea
admin@cloudchef-labs.com
Chango provides popular open source engines like Spark, Trino, Kafka, and Iceberg as lakehouse table format, and Chango specific components.
Chango is true unified data lakehouse platform with the support of most features which are necessary to build your data lakehouse.
Iceberg is default lakehouse table format in Chango and supported perfectly by Chango with strong storage security using Chango Authorizer
and Chango REST Catalog
.
SQL is essential in modern data lakehouses, even for ETL jobs. Chango provides powerful SQL engines such as Trino through Chango Trino Gateway
and Spark through Chango Spark Thrift Server
with strong storage security to execute interactive and ETL SQL queries.
In addition, streaming events are ingested to iceberg just using REST easily.
Users can run trino and spark sql queries like ETL queries and interactive queries through Superset which connects to Chango Trino Gateway
and Chango Spark Thrift Server
.
All the ETL query jobs will be integrated and scheduled with Azkaban
. Trino ETL queries and spark SQL ETL query jobs will be processed periodically by Azkaban
. ETL queries will be sent to Chango Query Exec
through REST, and ETL queries will be executed through Chango Trino Gateway
by Trino and Chango Spark Thrift Server
by Spark.
Chango CDC
which will send it to Chango Streaming Ingestion
(Chango Data API
+ Kafka + Chango Spark Streaming
) through REST. Incoming streaming events will be inserted into iceberg table.Chango Log
which will send it to Chango Streaming Ingestion
through REST.Chango Streaming Ingestion
through REST.Iceberg is most popular lakehouse table format. Iceberg is changing the paradyme of data lake and data lakehouses. Iceberg is default lakehouse table format in Chango and supported perfectly by Chango. Chango provides iceberg supported engines like Trino and Spark with connecting to Chango REST Catalog
which is iceberg REST catalog and maintains iceberg tables automatically for you. So you can build your iceberg centric data lakehouses with Chango easily.
Storage Security is a first-class mandatory in modern data lakehouses. Chango provides fine-grained data access control using RBAC to Chango storage. All data accesses are controlled in the fine-grained manner like catalog, schema and table level.
Chango REST Catalog
is iceberg REST Catalog used as data catalog in Chango.
Storage Security is a first-class mandatory in modern data lakehouses. Chango REST Catalog
works with Chango Authorizer
tightly which controls all the data access with strong storage security of catalog, schema and table level in Chango. That is, multiple iceberg supported engines like spark
, trino
can work with Chango REST Catalog
seamlessly with the support of strong storage security to iceberg in Chango.
Everytime data committed to iceberg tables, many files will be created like data files, snapshots, metadata files which should be maintained manually later. Chango REST Catalog
maintains iceberg tables automatically for you. Chango REST Catalog
does the followings for you automatically.
Chango SQL Procedure
is an easy way to load external files like CSV
, JSON
, Parquet
and ORC
located in s3 compatible object storage to iceberg tables in Chango without the need of additional development of spark jobs.
PROC iceberg.system.import (
source => 's3a://any-bucket/any-path',
s3_access_key => 'any access key',
s3_secret_key => 'any secret key',
s3_endpoint => 'any endpoint',
s3_region => 'any region',
file_format => 'json',
id_columns => 'id_1, id_2',
action => 'MERGE',
target_table => 'iceberg.test_db.test'
)
If you want to insert streaming events like user behavior events, logs, IoT events to iceberg tables, you need to build event streaming platform like kafka and write streaming jobs like spark streaming jobs in most cases. But in Chango, you don’t have to do so. Streaming application can ingest streaming events to iceberg tables through REST API in chango directly without the need of additional streaming platform and streaming jobs.
Chango Log is a log agent to read local log files and send logs to iceberg tables in Chango to analyze logs. Using Chango Log
, you can analyze logs from all your distributed logs joining different databases in richer manner realtimely in Chango.
Chango CDC is Change Data Capture application to catch CDC data of database and send CDC data to iceberg tables in Chango. You don’t need such as Kafka and Kafka Connect cluster to accomplish CDC.
Chango Trino Gateway
is an implementation of trino gateway concept. Chango has the concept of trino gateway which routes trino queries to upstream backend trino clusters dynamically. If one of the backend trino clusters has exhausted, then trino gateway will route queries to the trino cluster which is executing less requested queries. Trino does not support HA because trino coordinator has single point failure. In order to support HA of trino, we need to use trino gateway. Chango Trino Gateway
also supports Resource Groups
to control resources of the backend trino clusters in which queries will be run.
Chango Query Exec
is a REST application to execute Trino and Spark SQL ETL queries to transform data in Chango. You just send such as ETL queries to Chango Query Exec
through REST using such as curl
. Chango Query Exec
has several advantages.
All the queries executed by query engines like Chango Trino Gateway
, Chango Spark Thrift Server
and Chango Spark SQL Runner
in Chango will be logged to explore history of all the executed queries later. You will see the query counts by query engines and roles, and explore the details of executed queries.