hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Batch Get performance degrades from within Mapreduce
Date Tue, 21 Feb 2012 19:49:10 GMT
First a side comment: if you send an email to a mailing list like this
one and didn't get any answer within a few hours, sending another one
right away usually won't help. It's just bad etiquette.

Now I'm reading over the whole thread and things are really not that
clear to me.

- You say you have 1 region server and 3 datanodes. Is there an
intersection? If not, you miss out on enabling local reads and take a
big performance hit although if you didn't enable it for your unit
test then it's just something you might want to look at later.

- What's the machine that runs the unit test like?

- How many disks per datanodes? JBOD SATA or fancier?

- Where are the mappers running? One task tracker per datanode? Or is
it per regionserver (eg 1)?

- You say you have 8 concurrent mappers running... so I don't know if
they are all on the same machine or not (see my previous question),
but since you have 7 regions it means by default you can only have 7
mappers running. Where's the 8th one coming from?

- When the MR job is running, how are the disks performing (via
iostat)? Again knowing whether or not the RS is colocated with a DN
would help at lot.

- Is the data set the same in the unit test and in the MR test?



On Mon, Feb 20, 2012 at 5:42 PM, Himanish Kushary <himanish@gmail.com> wrote:
> Could somebody help me figure out whats the difference while running
> through map-reduce..is it just the concurrency that causing the issue.Will
> increasing the number of region servers help ?
> BTW, the master is also on the same server as the regionserver.Is it just a
> environment issue or there is some other configuration that me improve the
> read performance from within the mapper.
> Thanks
> Himanish

View raw message