hawq-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lili Ma <lil...@apache.org>
Subject Re: AO and Parquet Format
Date Mon, 13 Feb 2017 07:11:04 GMT
AO format is organized in Row level.  And the data is organized in block
level, inside each block there are block header describing metadata, and
block content storing the actual data inside this block.   Most of the data
are represented in MemTuple.  You can specify blocksize when defining a AO

Parquet format is organized in Row-Column Level, and Parquet table format
in HAWQ is compatible with open source Parquet. The concept is
rowgroup->columnChunk->columnPage.  Each rowgroup stores multiple rows;
Inside each rowgroup, the data is organized by column, each column maps to
a columnChunk; A column Chunk is constructed by one or multiple columnPage.
  You can specify RowGroupSize and PageSize which maps to the max size of
RowGroup and ColumnPage when defining a Parquet table.

If you are interested, you can refer to these two files for detailed
1. AO table: src/backend/access/appendonly/appendonlyam.c
2. Parquet table: src/backend/access/parquet/parquetam.c

Best Regards,

2017-02-13 14:17 GMT+08:00 Ma Hongxu <interma@outlook.com>:

> Briefly, parquet format is organized by row groups, each row group is
> Column-oriented. And AO is Row-oriented, I guess it's very similar to PG
> heap format.
> Do you want the detailed introduction of format?
> Seems it doesn't have a details format wiki/doc of hawq, maybe Lili Ma
> have some resources.
> I am very interested in it also, let's discuss it in this mail-list in
> future.
> Thank you!
> 在 10/02/2017 22:37, Guo Kai 写道:
> Hi, guys!
> I want to ask more detail about AO and Parquet Format in HAWQ.
> As we know, in PostgreSQL, tuples is organized one by one in a fixed size
> block when the table format is heap. What about AO and Parquet?
> Thanks for any advice!:)
> --
> Regards,
> Hongxu.

View raw message