incubator-drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dharm Raj <dharmrajbali...@gmail.com>
Subject Re: Storage file format
Date Sat, 15 Sep 2012 15:09:46 GMT
For columnar storage, IMO each column can be managed in a separate file.
Dremel also seems to have each column in a separate file. This should be
easy to manage and update are possible. Please see
https://issues.apache.org/jira/browse/AVRO-806

Drill architecture slides shows AVRO-806 and trevni in Column storage box.
Are we looking them as candidate for storage format for drill?

If we have lot of data with high amount of sparsity and major use case is
to read only once data is written - Another way could be to store in a
column major sparse matrix format. It  looks easy to implement but updates
may be problematic. just a thought.

Regards,
Dharm

On Sat, Sep 15, 2012 at 7:24 PM, NAVEEN MAANJU <
naveen.maanju.apache@gmail.com> wrote:

> make sense..
>
> On Sat, Sep 15, 2012 at 6:44 AM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
>
> > The key goal here is to get something simple working quickly in a way
> that
> > allows additional, more advanced implementations.
> >
> > On Sat, Sep 15, 2012 at 5:47 AM, moon soo Lee <leemoonsoo@gmail.com>
> > wrote:
> >
> > > for column-storage, how about leverage Hbase or Accumulo?
> > >
> > > they'll also give a chance to data update (future work?)
> > >
> > >
> > > On Sat, Sep 15, 2012 at 9:30 PM, Azuryy Yu <azuryyyu@gmail.com> wrote:
> > >
> > > > Hi All,
> > > >
> > > > I am interested in working on storage format. (sign up?)
> > > >
> > > > I wrote a HDFS  file format, which is similar to Sequence file (row
> > > > storage, block management, compress), I provide InputFormat and
> > > > OutputFormat,
> > > >
> > > > sometimes it get a great performance, sometimes not, depends on the
> > data.
> > > >
> > > > for Drill, we should implement a column-storage, this can skip some
> > > columns
> > > > during query, and skip some rows within one column file. but this
> > > > column-storage should based on the distributed file system, such as
> > HDFS,
> > > > Mapr DFS, I like Mapr DFS because of HA.
> > > >
> > > > we can implement the following column storage file format, I think
> it's
> > > > enough to us.
> > > >
> > > > http://arxiv.org/pdf/1105.4252.pdf
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message