hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Himanish Kushary <himan...@gmail.com>
Subject Re: Batch Get performance degrades from within Mapreduce
Date Tue, 21 Feb 2012 20:31:29 GMT
Extremely sorry for the posts , I was also trying to provide a little bit
more information on our environment

- You say you have 1 region server and 3 datanodes. Is there an
intersection? If not, you miss out on enabling local reads and take a
big performance hit although if you didn't enable it for your unit
test then it's just something you might want to look at later. : The region
server is colocated with one of the datanodes out of the 3.

- What's the machine that runs the unit test like? - Unit test is running
on my laptop(8 core/8 GB) through Eclipse.

- How many disks per datanodes? JBOD SATA or fancier? - Datanode directory
are configured to point to a SAN drives

- Where are the mappers running? One task tracker per datanode? Or is it
per regionserver (eg 1)? - Yes, 1 TT per datanode.The server hosting the
regionserver also has a TT

- You say you have 8 concurrent mappers running... so I don't know if
they are all on the same machine or not (see my previous question),
but since you have 7 regions it means by default you can only have 7
mappers running. Where's the 8th one coming from? - My mapreduce job works
off a table which has 8 regions.But from inside the mapper I fire thousands
of GET's to another different table which has 7 regions

- When the MR job is running, how are the disks performing (via
iostat)? Again knowing whether or not the RS is colocated with a DN
would help at lot. - iostat on the regionserver during the MR shows

Time: 03:27:15 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          21.65    0.01    5.08    4.14    0.00   69.11

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sda               3.13         0.02         0.02       2176       1983
sda1              0.00         0.00         0.00          8          0
sda2              3.12         0.02         0.02       2167       1983
sdb              52.03         2.21         0.44     289621      58030
dm-0              5.41         0.02         0.02       2167       1983
dm-1              0.00         0.00         0.00          0          0


- Is the data set the same in the unit test and in the MR test? - The data
sets for the actual MR job is the same .The data set for the GETs within
the mapper are much much more than from the MR ( 120000 vs 2000 GETs)

-- Thanks
Himanish



On Tue, Feb 21, 2012 at 2:49 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> First a side comment: if you send an email to a mailing list like this
> one and didn't get any answer within a few hours, sending another one
> right away usually won't help. It's just bad etiquette.
>
> Now I'm reading over the whole thread and things are really not that
> clear to me.
>
> - You say you have 1 region server and 3 datanodes. Is there an
> intersection? If not, you miss out on enabling local reads and take a
> big performance hit although if you didn't enable it for your unit
> test then it's just something you might want to look at later.
>
> - What's the machine that runs the unit test like?
>
> - How many disks per datanodes? JBOD SATA or fancier?
>
> - Where are the mappers running? One task tracker per datanode? Or is
> it per regionserver (eg 1)?
>
> - You say you have 8 concurrent mappers running... so I don't know if
> they are all on the same machine or not (see my previous question),
> but since you have 7 regions it means by default you can only have 7
> mappers running. Where's the 8th one coming from?
>
> - When the MR job is running, how are the disks performing (via
> iostat)? Again knowing whether or not the RS is colocated with a DN
> would help at lot.
>
> - Is the data set the same in the unit test and in the MR test?
>
> Thx,
>
> J-D
>
> On Mon, Feb 20, 2012 at 5:42 PM, Himanish Kushary <himanish@gmail.com>
> wrote:
> > Could somebody help me figure out whats the difference while running
> > through map-reduce..is it just the concurrency that causing the
> issue.Will
> > increasing the number of region servers help ?
> >
> > BTW, the master is also on the same server as the regionserver.Is it
> just a
> > environment issue or there is some other configuration that me improve
> the
> > read performance from within the mapper.
> >
> > Thanks
> > Himanish
>



-- 
Thanks & Regards
Himanish

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message