hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Question about the time to execute joins in HBase!
Date Thu, 22 Aug 2013 15:46:20 GMT
You kind of have two threads along the same lines. 

See my response in your other thread...

On Aug 22, 2013, at 10:41 AM, Pavan Sudheendra <pavan0591@gmail.com> wrote:

> scan.setCaching(500);
> 
> I really don't understand this purpose though..
> 
> 
> On Thu, Aug 22, 2013 at 9:09 PM, Kevin O'dell <kevin.odell@cloudera.com>wrote:
> 
>> QQ what is your caching set to?
>> On Aug 22, 2013 11:25 AM, "Pavan Sudheendra" <pavan0591@gmail.com> wrote:
>> 
>>> Hi all,
>>> 
>>> A serious question.. I know this isn't one of the best hbase practices
>> but
>>> I really want to know..
>>> 
>>> I am doing a join across 3 table in hbase.. One table contain 19m
>> records,
>>> one contains 2m and another contains 1m records.
>>> 
>>> I'm doing this inside the mapper function.. I know this can be done with
>>> pig and hive etc. Leaving the specifics out, how long would experts think
>>> it would take for the mapper to finish aggregating them across a 6 node
>>> cluster.. One is the job tracker and 5 are task trackers.. By the time I
>>> see the map reduce job status for input records reach 600,000 it's taking
>>> an hour.. It can't be right..
>>> 
>>> Any tips? Please help.
>>> 
>>> Thanks.
>>> 
>>> --
>>> Regards-
>>> Pavan
>>> 
>> 
> 
> 
> 
> -- 
> Regards-
> Pavan

The opinions expressed here are mine, while they may reflect a cognitive thought, that is
purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com






Mime
View raw message