incubator-drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject Storage file format
Date Sat, 15 Sep 2012 12:30:41 GMT
Hi All,

I am interested in working on storage format. (sign up?)

I wrote a HDFS  file format, which is similar to Sequence file (row
storage, block management, compress), I provide InputFormat and
OutputFormat,

sometimes it get a great performance, sometimes not, depends on the data.

for Drill, we should implement a column-storage, this can skip some columns
during query, and skip some rows within one column file. but this
column-storage should based on the distributed file system, such as HDFS,
Mapr DFS, I like Mapr DFS because of HA.

we can implement the following column storage file format, I think it's
enough to us.

http://arxiv.org/pdf/1105.4252.pdf

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message