incubator-drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Gruzman <da...@bigdatacraft.com>
Subject Re: Drill native format
Date Fri, 14 Sep 2012 20:28:18 GMT
I assume that evolution of BigQuery reflects resolution of Dremel... If
somebody have information on it it would be great.
Storage system should understand that all file comprising the horizontal
partition of the table are one logical entity, and should store them
together / in some proximity. I agree that PAX will be much more
convinient. The question is - is there performance penalty of PAX vs file
per column?
David

On Fri, Sep 14, 2012 at 11:21 PM, Tomer Shiran <tshiran@maprtech.com> wrote:

> Is there any public information suggesting that Google moved away from
> supporting nested data? Clearly BigQuery doesn't yet allow nested data, but
> not sure that applies to Dremel.
>
> There are challenges with one file per column. How do you ensure that a
> single record is located on a single machine to avoid costly record
> reconstruction?
>
> On Fri, Sep 14, 2012 at 1:05 PM, David Gruzman <david@bigdatacraft.com
> >wrote:
>
> > Hi All,
> > I would like to discuss the question of what will be native format for
> > drill. Original Google dremel paper defined their hierarchical columnar
> > data format. Since then
> > google shifted from hierarchical data format... So it is a question if it
> > makes sense to stick with it?
> > If we are also moving to simple flat format we need our own format we
> have
> > to support "native". In case of Drill I would define that native support
> as
> > "high performance".
> > I think we can go to some kind of PAX format with comprehensive metadata
> in
> > the header, so each file is completely self contained and can be
> understood
> > and processed without any external data.
> > Alternative is to have single file per column. As far as I remember from
> > our OpenDremel work the main decision point is - if we can read one
> column
> > from the  file without loading into node memory unnecessary data from
> other
> > columns.
> > With best regards,
> > David
> >
>
>
>
> --
> Tomer Shiran
> Director of Product Management | MapR Technologies | 650-804-8657
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message