hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krishna Kalyan <krishnakaly...@gmail.com>
Subject Re: Pig HBase integration
Date Sun, 28 Sep 2014 12:04:30 GMT
Thanks Serega,

Our usecase details:
We have a location table which will be stored in HBase with locationID as
the rowkey / Joinkey.
We intend to join this table with a transactional WebLog file in HDFS
(Expected size can be around 2TB).
Joining query will be passed from Pig.
Can we expect a performance improvement when compared with mapreduce
appoach?.

Regards,
Krishna

On Sat, Sep 27, 2014 at 9:13 PM, Serega Sheypak <serega.sheypak@gmail.com>
wrote:

> Depends on the datasets size and HBase workload. The best way is to do join
> in pig, store it and then use HBase bulk load tool.
> It's general recommendation. I have no idea about your task details
>
> 2014-09-27 7:32 GMT+04:00 Krishna Kalyan <krishnakalyan3@gmail.com>:
>
> > Hi,
> > We have a use case that involves ETL on data coming from several
> different
> > sources using pig.
> > We plan to store the final output table in HBase.
> > What will be the performance impact if we do a join with an external CSV
> > table using pig?.
> >
> > Regards,
> > Krishna
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message