Sure..Both input & output are HBase tables
Input (mapper phase)  scanning a HBase table for all records within time range (using hbase
timestamps)
Output (reduce phase)  doing a Put to 3 different HBase tables
Original Message
From: JeanDaniel Cryans <jdcryans@apache.org>
To: user@hbase.apache.org
Sent: Tue, Oct 5, 2010 11:14 pm
Subject: Re: HBase map reduce job timing
It'd be more useful if we knew where that data is coming from, and
where it's going. Are you scanning HBase and/or writing to it?
JD
On Tue, Oct 5, 2010 at 8:05 PM, Venkatesh <vramanathan00@aol.com> wrote:
>
>
>
> Sorry..yeah..i've to do some digging to provide some data..
> What sort of data would be helpful? Would stats reported by jobtracker.jsp
suffice? I've pasted that in this email..
> I can gather more jvm stats..thanks
>
> Status: Succeeded
> Started at: Tue Oct 05 21:39:58 EDT 2010
> Finished at: Tue Oct 05 22:36:43 EDT 2010
> Finished in: 56mins, 45sec
> Job Cleanup: Successful
>
>
>
> Kind
> % Complete
> Num Tasks
> Pending
> Running
> Complete
> Killed
> Failed/Killed
> Task Attempts
>
> map
> 100.00%
>
>
>
>
>
> 565
> 0
> 0
> 565
> 0
> 0 / 11
>
> reduce
> 100.00%
>
>
>
>
>
> 20
> 0
> 0
> 20
> 0
> 0 / 2
>
>
>
>
>
>
>
> Counter
>
> Map
>
> Reduce
>
> Total
>
>
>
> Job Counters
>
> Launched reduce tasks
>
> 0
>
> 0
>
> 22
>
>
>
> Racklocal map tasks
>
> 0
>
> 0
>
> 66
>
>
>
> Launched map tasks
>
> 0
>
> 0
>
> 576
>
>
>
> Datalocal map tasks
>
> 0
>
> 0
>
> 510
>
>
>
> com.JobRecords
>
> REDUCE_PHASE_RECORDS
>
> 0
>
> 597,712
>
> 597,712
>
>
>
> MAP_PHASE_RECORDS
>
> 2,534,807
>
> 0
>
> 2,534,807
>
>
>
> FileSystemCounters
>
> FILE_BYTES_READ
>
> 335,845,726
>
> 861,146,518
>
> 1,196,992,244
>
>
>
> FILE_BYTES_WRITTEN
>
> 1,197,031,156
>
> 861,146,518
>
> 2,058,177,674
>
>
>
> MapReduce Framework
>
> Reduce input groups
>
> 0
>
> 597,712
>
> 597,712
>
>
>
> Combine output records
>
> 0
>
> 0
>
> 0
>
>
>
> Map input records
>
> 2,534,807
>
> 0
>
> 2,534,807
>
>
>
> Reduce shuffle bytes
>
> 0
>
> 789,145,342
>
> 789,145,342
>
>
>
> Reduce output records
>
> 0
>
> 0
>
> 0
>
>
>
> Spilled Records
>
> 3,522,428
>
> 2,534,807
>
> 6,057,235
>
>
>
> Map output bytes
>
> 851,007,170
>
> 0
>
> 851,007,170
>
>
>
> Map output records
>
> 2,534,807
>
> 0
>
> 2,534,807
>
>
>
> Combine input records
>
> 0
>
> 0
>
> 0
>
>
>
> Reduce input records
>
> 0
>
> 2,534,807
>
> 2,534,807
>
>
>
>
>
>
>
>
> Original Message
> From: JeanDaniel Cryans <jdcryans@apache.org>
> To: user@hbase.apache.org
> Sent: Tue, Oct 5, 2010 10:53 pm
> Subject: Re: HBase map reduce job timing
>
>
> I'd love to give you tips, but you didn't provide any data about the
> input and output of your job, the kind of hardware you're using, etc.
> At this point any suggestion would be a stab in the dark, the best I
> can do is pointing to the existing documentation
> http://wiki.apache.org/hadoop/PerformanceTuning
>
> JD
>
> On Tue, Oct 5, 2010 at 7:12 PM, Venkatesh <vramanathan00@aol.com> wrote:
>>
>>
>>
>> I've a mapreduce job that is taking too long..over an hour..Trying to see
> what can a tune
>> to to bring it down..One thing I noticed, the job is kicking off
>>  500+ map tasks : 490 of them do not process any records..where as 10 of
them
> process all the records
>> (200 K each..)..Any idea why that would be?...
>>
>> ..map phase takes about couple of minutes..
>> ..reduce phase takes the rest..
>>
>> ..i'll try increasing # of reduce tasks..Open to other other suggestion for
> tunables..
>>
>> thanks for your input
>> venkatesh
>>
>>
>>
>
>
>
