hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Optimizations for random read performance
Date Tue, 16 Feb 2010 22:45:26 GMT
On Tue, Feb 16, 2010 at 2:25 PM, James Baldassari <james@dataxu.com> wrote:
> On Tue, 2010-02-16 at 14:05 -0600, Stack wrote:
>> On Tue, Feb 16, 2010 at 10:50 AM, James Baldassari <james@dataxu.com> wrote:
>
> Whether the keys themselves are evenly distributed is another matter.
> Our keys are user IDs, and they should be fairly random.  If we do a
> status 'detailed' in the hbase shell we see the following distribution
> for the value of "requests" (not entirely sure what this value means):
> hdfs01: 7078
> hdfs02: 5898
> hdfs03: 5870
> hdfs04: 3807
>
That looks like they are evenly distributed.  Requests are how many
hits a second.  See the UI on master port 60010.  The numbers should
match.


> There are no order of magnitude differences here, and the request count
> doesn't seem to map to the load on the server.  Right now hdfs02 has a
> load of 16 while the 3 others have loads between 1 and 2.


This is interesting.  I went back over your dumps of cache stats above
and the 'loaded' servers didn't have any attribute there that
differentiated it from others.  For example, the number of storefiles
seemed about same.

I wonder what is making for the high load?  Can you figure it?  Is it
high CPU use (unlikely).  Is it then high i/o?  Can you try and figure
whats different about the layout under the loaded server and that of
an unloaded server?  Maybe do a ./bin/hadoop fs -lsr /hbase and see if
anything jumps out at you.

If you want to post the above or a loaded servers log to pastbin we'll
take a looksee.


Applying
> HBASE-2180 did not make any measurable difference.  There are no errors
> in the region server logs.  However, looking at the Hadoop datanode
> logs, I'm seeing lots of these:
>
> 2010-02-16 17:07:54,064 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.24.183.165:50010,
storageID=DS-1519453437-10.24.183.165-50010-1265907617548, infoPort=50075, ipcPort=50020):DataXceiver
> java.io.EOFException
>        at java.io.DataInputStream.readShort(DataInputStream.java:298)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:79)
>        at java.lang.Thread.run(Thread.java:619)

You upped xceivers on your hdfs cluster?  If you look at otherend of
the above EOFE, can you see why it died?


>
> However, I do think it's strange that
> the load is so unbalanced on the region servers.
>

I agree.


> We're also going to try throwing some more hardware at the problem.
> We'll set up a new cluster with 16-core, 16G nodes to see if they are
> better able to handle the large number of client requests.  We might
> also decrease the block size to 32k or lower.
>
Ok.

>> Should only be a matter if you intend distributing the above.
>
> This is probably a topic for a separate thread, but I've never seen a
> legal definition for the word "distribution."  How does this apply to
> the SaaS model?
>
Fair enough.

Something is up.  Especially if hbase-2180 made no difference.

St.Ack

Mime
View raw message