hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: HBase map reduce job timing
Date Wed, 06 Oct 2010 03:30:00 GMT
Ah ok, then using the write buffer should get you the speed you need
(providing that you have the hardware capacity and that you use HTable
in a efficient way).

In setup() set this to false on all 3 htables:
http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/client/HTable.html#setAutoFlush(boolean)

In cleanup() call this on all htables:
http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/client/HTable.html#flushCommits()

Also to make your maps faster you could set this to 10 or more when
you create your input format:
http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/client/Scan.html#setCaching(int)

J-D

On Tue, Oct 5, 2010 at 8:23 PM, Venkatesh <vramanathan00@aol.com> wrote:
>
>  Sure..Both input & output are HBase tables
> Input (mapper phase) - scanning a HBase table for all records within time range (using
hbase timestamps)
> Output (reduce phase) - doing a Put to 3 different HBase tables
>
>
>
> -----Original Message-----
> From: Jean-Daniel Cryans <jdcryans@apache.org>
> To: user@hbase.apache.org
> Sent: Tue, Oct 5, 2010 11:14 pm
> Subject: Re: HBase map reduce job timing
>
>
> It'd be more useful if we knew where that data is coming from, and
> where it's going. Are you scanning HBase and/or writing to it?
>
> J-D
>
> On Tue, Oct 5, 2010 at 8:05 PM, Venkatesh <vramanathan00@aol.com> wrote:
>>
>>
>>
>>  Sorry..yeah..i've to do some digging to provide some data..
>> What sort of data would be helpful? Would stats reported by jobtracker.jsp
> suffice? I've pasted that in this email..
>> I can gather more jvm stats..thanks
>>
>> Status: Succeeded
>> Started at: Tue Oct 05 21:39:58 EDT 2010
>> Finished at: Tue Oct 05 22:36:43 EDT 2010
>> Finished in: 56mins, 45sec
>> Job Cleanup: Successful
>>
>>
>>
>> Kind
>> % Complete
>> Num Tasks
>> Pending
>> Running
>> Complete
>> Killed
>> Failed/Killed
>> Task Attempts
>>
>> map
>> 100.00%
>>
>>
>>
>>
>>
>> 565
>> 0
>> 0
>> 565
>> 0
>> 0 / 11
>>
>> reduce
>> 100.00%
>>
>>
>>
>>
>>
>> 20
>> 0
>> 0
>> 20
>> 0
>> 0 / 2
>>
>>
>>
>>
>>
>>
>>
>> Counter
>>
>> Map
>>
>> Reduce
>>
>> Total
>>
>>
>>
>> Job Counters
>>
>> Launched reduce tasks
>>
>> 0
>>
>> 0
>>
>> 22
>>
>>
>>
>> Rack-local map tasks
>>
>> 0
>>
>> 0
>>
>> 66
>>
>>
>>
>> Launched map tasks
>>
>> 0
>>
>> 0
>>
>> 576
>>
>>
>>
>> Data-local map tasks
>>
>> 0
>>
>> 0
>>
>> 510
>>
>>
>>
>> com.JobRecords
>>
>> REDUCE_PHASE_RECORDS
>>
>> 0
>>
>> 597,712
>>
>> 597,712
>>
>>
>>
>> MAP_PHASE_RECORDS
>>
>> 2,534,807
>>
>> 0
>>
>> 2,534,807
>>
>>
>>
>> FileSystemCounters
>>
>> FILE_BYTES_READ
>>
>> 335,845,726
>>
>> 861,146,518
>>
>> 1,196,992,244
>>
>>
>>
>> FILE_BYTES_WRITTEN
>>
>> 1,197,031,156
>>
>> 861,146,518
>>
>> 2,058,177,674
>>
>>
>>
>> Map-Reduce Framework
>>
>> Reduce input groups
>>
>> 0
>>
>> 597,712
>>
>> 597,712
>>
>>
>>
>> Combine output records
>>
>> 0
>>
>> 0
>>
>> 0
>>
>>
>>
>> Map input records
>>
>> 2,534,807
>>
>> 0
>>
>> 2,534,807
>>
>>
>>
>> Reduce shuffle bytes
>>
>> 0
>>
>> 789,145,342
>>
>> 789,145,342
>>
>>
>>
>> Reduce output records
>>
>> 0
>>
>> 0
>>
>> 0
>>
>>
>>
>> Spilled Records
>>
>> 3,522,428
>>
>> 2,534,807
>>
>> 6,057,235
>>
>>
>>
>> Map output bytes
>>
>> 851,007,170
>>
>> 0
>>
>> 851,007,170
>>
>>
>>
>> Map output records
>>
>> 2,534,807
>>
>> 0
>>
>> 2,534,807
>>
>>
>>
>> Combine input records
>>
>> 0
>>
>> 0
>>
>> 0
>>
>>
>>
>> Reduce input records
>>
>> 0
>>
>> 2,534,807
>>
>> 2,534,807
>>
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Jean-Daniel Cryans <jdcryans@apache.org>
>> To: user@hbase.apache.org
>> Sent: Tue, Oct 5, 2010 10:53 pm
>> Subject: Re: HBase map reduce job timing
>>
>>
>> I'd love to give you tips, but you didn't provide any data about the
>> input and output of your job, the kind of hardware you're using, etc.
>> At this point any suggestion would be a stab in the dark, the best I
>> can do is pointing to the existing documentation
>> http://wiki.apache.org/hadoop/PerformanceTuning
>>
>> J-D
>>
>> On Tue, Oct 5, 2010 at 7:12 PM, Venkatesh <vramanathan00@aol.com> wrote:
>>>
>>>
>>>
>>>  I've a mapreduce job that is taking too long..over an hour..Trying to see
>> what can a tune
>>> to to bring it down..One thing I noticed, the job is kicking off
>>> - 500+ map tasks : 490 of them do not process any records..where as 10 of
> them
>> process all the records
>>>  (200 K each..)..Any idea why that would be?...
>>>
>>> ..map phase takes about couple of minutes..
>>> ..reduce phase takes the rest..
>>>
>>> ..i'll try increasing # of reduce tasks..Open to other other suggestion for
>> tunables..
>>>
>>> thanks for your input
>>> venkatesh
>>>
>>>
>>>
>>
>>
>>
>
>
>

Mime
View raw message