hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nishanth S <nishanth.2...@gmail.com>
Subject Re: Loading hbase from parquet files
Date Wed, 08 Oct 2014 16:45:46 GMT
Thanks Andrey.In the current system  the hbase cfs have a ttl of  30 days
and data gets deleted after this(has snappy compression).Below is something
what I am trying to acheive.

1.Export the data from hbase  table  before it gets deleted.
2.Store it  in some format  which supports maximum compression(storage cost
is my primary concern here),so looking at parquet.
3.Load a subset of this data back into hbase based on  certain rules(say i
want  to load all rows which has a particular string in one of the fields).


I was thinking of bulkloading this data back into hbase but I am not sure
how I can  load a subset of the data using
org.apache.hadoop.hbase.mapreduce.Driver
import.






On Wed, Oct 8, 2014 at 10:20 AM, Andrey Stepachev <octo47@gmail.com> wrote:

> Hi Nishanth.
>
> Not clear what exactly you are building.
> Can you share more detailed description of what you are building, how
> parquet files are supposed to be ingested.
> Some questions arise:
> 1. is that online import or bulk load
> 2. why rules need to be deployed to cluster. Do you suppose to do reading
> inside hbase region server?
>
> As for deploying filters your cat try to use coprocessors instead. They can
> be configurable and loadable (but not
> unloadable, so you need to think about some class loading magic like
> ClassWorlds)
> For bulk imports you can create HFiles directly and add them incrementally:
> http://hbase.apache.org/book/arch.bulk.load.html
>
> On Wed, Oct 8, 2014 at 8:13 PM, Nishanth S <nishanth.2884@gmail.com>
> wrote:
>
> > I was thinking of using org.apache.hadoop.hbase.mapreduce.Driver import.
> I
> > could see that we can pass in filters  to this utility but looks less
> > flexible since  you need to deploy a new filter every time  the rules for
> > processing records change.Is there some way that we could define a rules
> > engine?
> >
> >
> > Thanks,
> > -Nishan
> >
> > On Wed, Oct 8, 2014 at 9:50 AM, Nishanth S <nishanth.2884@gmail.com>
> > wrote:
> >
> > > Hey folks,
> > >
> > > I am evaluating on loading  an  hbase table from parquet files based on
> > > some rules that  would be applied on  parquet file records.Could some
> one
> > > help me on what would be the best way to do this?.
> > >
> > >
> > > Thanks,
> > > Nishan
> > >
> >
>
>
>
> --
> Andrey.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message