hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Loading hbase from parquet files
Date Wed, 08 Oct 2014 16:52:58 GMT
Since storage is your primary concern, take a look at Doug Meil's blog 'The
Effect of ColumnFamily, RowKey and KeyValue Design on HFile Size':
http://blogs.apache.org/hbase/

Cheers

On Wed, Oct 8, 2014 at 9:45 AM, Nishanth S <nishanth.2884@gmail.com> wrote:

> Thanks Andrey.In the current system  the hbase cfs have a ttl of  30 days
> and data gets deleted after this(has snappy compression).Below is something
> what I am trying to acheive.
>
> 1.Export the data from hbase  table  before it gets deleted.
> 2.Store it  in some format  which supports maximum compression(storage cost
> is my primary concern here),so looking at parquet.
> 3.Load a subset of this data back into hbase based on  certain rules(say i
> want  to load all rows which has a particular string in one of the fields).
>
>
> I was thinking of bulkloading this data back into hbase but I am not sure
> how I can  load a subset of the data using
> org.apache.hadoop.hbase.mapreduce.Driver
> import.
>
>
>
>
>
>
> On Wed, Oct 8, 2014 at 10:20 AM, Andrey Stepachev <octo47@gmail.com>
> wrote:
>
> > Hi Nishanth.
> >
> > Not clear what exactly you are building.
> > Can you share more detailed description of what you are building, how
> > parquet files are supposed to be ingested.
> > Some questions arise:
> > 1. is that online import or bulk load
> > 2. why rules need to be deployed to cluster. Do you suppose to do reading
> > inside hbase region server?
> >
> > As for deploying filters your cat try to use coprocessors instead. They
> can
> > be configurable and loadable (but not
> > unloadable, so you need to think about some class loading magic like
> > ClassWorlds)
> > For bulk imports you can create HFiles directly and add them
> incrementally:
> > http://hbase.apache.org/book/arch.bulk.load.html
> >
> > On Wed, Oct 8, 2014 at 8:13 PM, Nishanth S <nishanth.2884@gmail.com>
> > wrote:
> >
> > > I was thinking of using org.apache.hadoop.hbase.mapreduce.Driver
> import.
> > I
> > > could see that we can pass in filters  to this utility but looks less
> > > flexible since  you need to deploy a new filter every time  the rules
> for
> > > processing records change.Is there some way that we could define a
> rules
> > > engine?
> > >
> > >
> > > Thanks,
> > > -Nishan
> > >
> > > On Wed, Oct 8, 2014 at 9:50 AM, Nishanth S <nishanth.2884@gmail.com>
> > > wrote:
> > >
> > > > Hey folks,
> > > >
> > > > I am evaluating on loading  an  hbase table from parquet files based
> on
> > > > some rules that  would be applied on  parquet file records.Could some
> > one
> > > > help me on what would be the best way to do this?.
> > > >
> > > >
> > > > Thanks,
> > > > Nishan
> > > >
> > >
> >
> >
> >
> > --
> > Andrey.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message