incubator-drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Gruzman <da...@bigdatacraft.com>
Subject Drill native format
Date Fri, 14 Sep 2012 20:05:11 GMT
Hi All,
I would like to discuss the question of what will be native format for
drill. Original Google dremel paper defined their hierarchical columnar
data format. Since then
google shifted from hierarchical data format... So it is a question if it
makes sense to stick with it?
If we are also moving to simple flat format we need our own format we have
to support "native". In case of Drill I would define that native support as
"high performance".
I think we can go to some kind of PAX format with comprehensive metadata in
the header, so each file is completely self contained and can be understood
and processed without any external data.
Alternative is to have single file per column. As far as I remember from
our OpenDremel work the main decision point is - if we can read one column
from the  file without loading into node memory unnecessary data from other
columns.
With best regards,
David

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message