incubator-drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Camuel Gilyadov <cam...@gmail.com>
Subject Re: Storage file format
Date Sat, 15 Sep 2012 16:19:55 GMT
Drill doesn't support updates. It is append only data store and append is
usually expected to be a nice data chunk not a single row

On Sat, Sep 15, 2012 at 8:09 AM, Dharm Raj <dharmrajbaliyan@gmail.com>wrote:

> For columnar storage, IMO each column can be managed in a separate file.
> Dremel also seems to have each column in a separate file. This should be
> easy to manage and update are possible. Please see
> https://issues.apache.org/jira/browse/AVRO-806
>
> Drill architecture slides shows AVRO-806 and trevni in Column storage box.
> Are we looking them as candidate for storage format for drill?
>
> If we have lot of data with high amount of sparsity and major use case is
> to read only once data is written - Another way could be to store in a
> column major sparse matrix format. It  looks easy to implement but updates
> may be problematic. just a thought.
>
> Regards,
> Dharm
>
> On Sat, Sep 15, 2012 at 7:24 PM, NAVEEN MAANJU <
> naveen.maanju.apache@gmail.com> wrote:
>
> > make sense..
> >
> > On Sat, Sep 15, 2012 at 6:44 AM, Ted Dunning <ted.dunning@gmail.com>
> > wrote:
> >
> > > The key goal here is to get something simple working quickly in a way
> > that
> > > allows additional, more advanced implementations.
> > >
> > > On Sat, Sep 15, 2012 at 5:47 AM, moon soo Lee <leemoonsoo@gmail.com>
> > > wrote:
> > >
> > > > for column-storage, how about leverage Hbase or Accumulo?
> > > >
> > > > they'll also give a chance to data update (future work?)
> > > >
> > > >
> > > > On Sat, Sep 15, 2012 at 9:30 PM, Azuryy Yu <azuryyyu@gmail.com>
> wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I am interested in working on storage format. (sign up?)
> > > > >
> > > > > I wrote a HDFS  file format, which is similar to Sequence file (row
> > > > > storage, block management, compress), I provide InputFormat and
> > > > > OutputFormat,
> > > > >
> > > > > sometimes it get a great performance, sometimes not, depends on the
> > > data.
> > > > >
> > > > > for Drill, we should implement a column-storage, this can skip some
> > > > columns
> > > > > during query, and skip some rows within one column file. but this
> > > > > column-storage should based on the distributed file system, such
as
> > > HDFS,
> > > > > Mapr DFS, I like Mapr DFS because of HA.
> > > > >
> > > > > we can implement the following column storage file format, I think
> > it's
> > > > > enough to us.
> > > > >
> > > > > http://arxiv.org/pdf/1105.4252.pdf
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message