hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Serega Sheypak <serega.shey...@gmail.com>
Subject Re: Pig HBase integration
Date Sun, 28 Sep 2014 12:21:27 GMT
store location to hdfs
store weblog to hdfs
join them
use HBase bulk load tool to load join result to hbase.

What's the reason to keep location dataset in hbase and weblogs in hdfs?

You can expect data load perfomance improvement. For me it takes few
minutes to bulk load 500.000.000 records to 10-nodes hbase with presplitted
table.

2014-09-28 16:04 GMT+04:00 Krishna Kalyan <krishnakalyan3@gmail.com>:

> Thanks Serega,
>
> Our usecase details:
> We have a location table which will be stored in HBase with locationID as
> the rowkey / Joinkey.
> We intend to join this table with a transactional WebLog file in HDFS
> (Expected size can be around 2TB).
> Joining query will be passed from Pig.
> Can we expect a performance improvement when compared with mapreduce
> appoach?.
>
> Regards,
> Krishna
>
> On Sat, Sep 27, 2014 at 9:13 PM, Serega Sheypak <serega.sheypak@gmail.com>
> wrote:
>
>> Depends on the datasets size and HBase workload. The best way is to do
>> join
>> in pig, store it and then use HBase bulk load tool.
>> It's general recommendation. I have no idea about your task details
>>
>> 2014-09-27 7:32 GMT+04:00 Krishna Kalyan <krishnakalyan3@gmail.com>:
>>
>> > Hi,
>> > We have a use case that involves ETL on data coming from several
>> different
>> > sources using pig.
>> > We plan to store the final output table in HBase.
>> > What will be the performance impact if we do a join with an external CSV
>> > table using pig?.
>> >
>> > Regards,
>> > Krishna
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message