incubator-drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dharm Raj <dharmrajbali...@gmail.com>
Subject Re: Storage file format
Date Sat, 15 Sep 2012 17:02:59 GMT
You are right Camuel. While thinking  storage format I was thinking about
append. Misplaced update.

On Sat, Sep 15, 2012 at 9:49 PM, Camuel Gilyadov <camuel@gmail.com> wrote:

> Drill doesn't support updates. It is append only data store and append is
> usually expected to be a nice data chunk not a single row
>
> On Sat, Sep 15, 2012 at 8:09 AM, Dharm Raj <dharmrajbaliyan@gmail.com
> >wrote:
>
> > For columnar storage, IMO each column can be managed in a separate file.
> > Dremel also seems to have each column in a separate file. This should be
> > easy to manage and update are possible. Please see
> > https://issues.apache.org/jira/browse/AVRO-806
> >
> > Drill architecture slides shows AVRO-806 and trevni in Column storage
> box.
> > Are we looking them as candidate for storage format for drill?
> >
> > If we have lot of data with high amount of sparsity and major use case is
> > to read only once data is written - Another way could be to store in a
> > column major sparse matrix format. It  looks easy to implement but
> updates
> > may be problematic. just a thought.
> >
> > Regards,
> > Dharm
> >
> > On Sat, Sep 15, 2012 at 7:24 PM, NAVEEN MAANJU <
> > naveen.maanju.apache@gmail.com> wrote:
> >
> > > make sense..
> > >
> > > On Sat, Sep 15, 2012 at 6:44 AM, Ted Dunning <ted.dunning@gmail.com>
> > > wrote:
> > >
> > > > The key goal here is to get something simple working quickly in a way
> > > that
> > > > allows additional, more advanced implementations.
> > > >
> > > > On Sat, Sep 15, 2012 at 5:47 AM, moon soo Lee <leemoonsoo@gmail.com>
> > > > wrote:
> > > >
> > > > > for column-storage, how about leverage Hbase or Accumulo?
> > > > >
> > > > > they'll also give a chance to data update (future work?)
> > > > >
> > > > >
> > > > > On Sat, Sep 15, 2012 at 9:30 PM, Azuryy Yu <azuryyyu@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I am interested in working on storage format. (sign up?)
> > > > > >
> > > > > > I wrote a HDFS  file format, which is similar to Sequence file
> (row
> > > > > > storage, block management, compress), I provide InputFormat
and
> > > > > > OutputFormat,
> > > > > >
> > > > > > sometimes it get a great performance, sometimes not, depends
on
> the
> > > > data.
> > > > > >
> > > > > > for Drill, we should implement a column-storage, this can skip
> some
> > > > > columns
> > > > > > during query, and skip some rows within one column file. but
this
> > > > > > column-storage should based on the distributed file system,
such
> as
> > > > HDFS,
> > > > > > Mapr DFS, I like Mapr DFS because of HA.
> > > > > >
> > > > > > we can implement the following column storage file format, I
> think
> > > it's
> > > > > > enough to us.
> > > > > >
> > > > > > http://arxiv.org/pdf/1105.4252.pdf
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message