carbondata-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Liang Chen <>
Subject [ANNOUNCE] Apache CarbonData 1.3.0 release
Date Sat, 10 Feb 2018 07:24:06 GMT

The Apache CarbonData PMC team is happy to announce the release of Apache
CarbonData version 1.3.0.

What’s New in Version 1.3.0?

In this version of CarbonData, following are the new features added for
performance improvements, compatibility, and usability of CarbonData.
Support Spark 2.2.1

Spark 2.2.1 is the latest stable version and has added new features and
improved the performance. CarbonData 1.3.0 integrate with it for getting
the advantage of it after upgrading.
Support Streaming

Supports streaming ingestion for real-time data. After the real-time data
is ingested into carbon store, it can be queried from compute engine like
Pre Aggregate Support

Supports pre aggregating of data so that "group by" kind of queries can
fetch data much faster(around 10X performance faster). You can create as
many aggregate tables as require as datamaps to improve their query
Support Time Series (Alpha feature)

Supports to create multiple pre-aggregate tables for the time hierarchy and
CarbonData can do automatic roll-up for the queries on these
hierarchies.Note, this feature is alpha feature

Supports to create a CarbonData table from any of the Parquet/Hive/Carbon
table. This is beneficial when you want to create CarbonData table from any
other Parquet/Hive table and use the Carbon query engine to query and
achieve better query results. This can be also used for backing up the data.
Standard Partitioning

Supports standard partition, similar to spark and hive partition, this
allows you to use any columns to create a partition for improving query
performance significantly.
Support External DB & Table Path

Supports external DB and Table path. Now while creating DB or table, you
can specify the location where the DB or table needs to be stored.
Support Query Data with Specified Dataload

Support query data with specified segments (one dataload generates one
segment), users can query data as per the real required data, this would be
very helpful to improve query performance.
Support Boolean Data Type

You can follow this document to use these artifacts:

You can find the latest CarbonData document and learn more at: <>

Please find the detailed JIRA list:


View raw message