hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Batch Get performance degrades from within Mapreduce
Date Tue, 21 Feb 2012 22:27:48 GMT
Something that strikes me in your answers is why have you chosen HBase
for this? You're using it in a way that doesn't make sense to me as
you use a SAN, don't put HBase on all nodes, have a tiny cluster... or
is it just a testbed?

In order to do a more proper comparison I think you should run your
unit test on one of the machines and make sure it stores the data in
the SAN. This is one big wildcard here. Also get stats from that SAN.

J-D

On Tue, Feb 21, 2012 at 12:31 PM, Himanish Kushary <himanish@gmail.com> wrote:
> Extremely sorry for the posts , I was also trying to provide a little bit
> more information on our environment
>
> - You say you have 1 region server and 3 datanodes. Is there an
> intersection? If not, you miss out on enabling local reads and take a
> big performance hit although if you didn't enable it for your unit
> test then it's just something you might want to look at later. : The region
> server is colocated with one of the datanodes out of the 3.
>
> - What's the machine that runs the unit test like? - Unit test is running
> on my laptop(8 core/8 GB) through Eclipse.
>
> - How many disks per datanodes? JBOD SATA or fancier? - Datanode directory
> are configured to point to a SAN drives
>
> - Where are the mappers running? One task tracker per datanode? Or is it
> per regionserver (eg 1)? - Yes, 1 TT per datanode.The server hosting the
> regionserver also has a TT
>
> - You say you have 8 concurrent mappers running... so I don't know if
> they are all on the same machine or not (see my previous question),
> but since you have 7 regions it means by default you can only have 7
> mappers running. Where's the 8th one coming from? - My mapreduce job works
> off a table which has 8 regions.But from inside the mapper I fire thousands
> of GET's to another different table which has 7 regions
>
> - When the MR job is running, how are the disks performing (via
> iostat)? Again knowing whether or not the RS is colocated with a DN
> would help at lot. - iostat on the regionserver during the MR shows
>
> Time: 03:27:15 PM
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>          21.65    0.01    5.08    4.14    0.00   69.11
>
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sda               3.13         0.02         0.02       2176      
1983
> sda1              0.00         0.00         0.00          8    
     0
> sda2              3.12         0.02         0.02       2167      
1983
> sdb              52.03         2.21         0.44     289621      58030
> dm-0              5.41         0.02         0.02       2167      
1983
> dm-1              0.00         0.00         0.00          0    
     0
>
>
> - Is the data set the same in the unit test and in the MR test? - The data
> sets for the actual MR job is the same .The data set for the GETs within
> the mapper are much much more than from the MR ( 120000 vs 2000 GETs)
>
> -- Thanks
> Himanish
>
>
>
> On Tue, Feb 21, 2012 at 2:49 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:
>
>> First a side comment: if you send an email to a mailing list like this
>> one and didn't get any answer within a few hours, sending another one
>> right away usually won't help. It's just bad etiquette.
>>
>> Now I'm reading over the whole thread and things are really not that
>> clear to me.
>>
>> - You say you have 1 region server and 3 datanodes. Is there an
>> intersection? If not, you miss out on enabling local reads and take a
>> big performance hit although if you didn't enable it for your unit
>> test then it's just something you might want to look at later.
>>
>> - What's the machine that runs the unit test like?
>>
>> - How many disks per datanodes? JBOD SATA or fancier?
>>
>> - Where are the mappers running? One task tracker per datanode? Or is
>> it per regionserver (eg 1)?
>>
>> - You say you have 8 concurrent mappers running... so I don't know if
>> they are all on the same machine or not (see my previous question),
>> but since you have 7 regions it means by default you can only have 7
>> mappers running. Where's the 8th one coming from?
>>
>> - When the MR job is running, how are the disks performing (via
>> iostat)? Again knowing whether or not the RS is colocated with a DN
>> would help at lot.
>>
>> - Is the data set the same in the unit test and in the MR test?
>>
>> Thx,
>>
>> J-D
>>
>> On Mon, Feb 20, 2012 at 5:42 PM, Himanish Kushary <himanish@gmail.com>
>> wrote:
>> > Could somebody help me figure out whats the difference while running
>> > through map-reduce..is it just the concurrency that causing the
>> issue.Will
>> > increasing the number of region servers help ?
>> >
>> > BTW, the master is also on the same server as the regionserver.Is it
>> just a
>> > environment issue or there is some other configuration that me improve
>> the
>> > read performance from within the mapper.
>> >
>> > Thanks
>> > Himanish
>>
>
>
>
> --
> Thanks & Regards
> Himanish

Mime
View raw message