hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Silberstein <silbe...@yahoo-inc.com>
Subject Re: Problems with read ops when table size is large
Date Sun, 06 Dec 2009 07:07:35 GMT
Thanks for the suggestions.  Let me run down what I tried:
1. My ulimit was already much higher than 1024, so no change there.
2. I was not using hdfs-127.  I switched to that.  I didn't use M/R to do my
initial load, by the way.
3. I was a little unclear on which handler counts to increase and to what.
I changed hbase.regionserver.handler.count, dfs.namenode.handler.count, and
dfs.datanode.handler.count all from 10 to 100.
4. I did see the error that I was exceeding the dfs.datanode.max.xcievers
value 256.  What's odd is that I have that set to ~3000, but it's apparently
not getting picked up by hdfs when it starts.  Any ideas there (like is it
really xceivers)?
5. I'm not sure how many regions per regionserver.  What's a good way to
check that.
6. Didn't get to checking for missing block.

Ultimately, either #2 or #3 or both helped.  I was able to push throughput
way up without seeing the error recur.  So thanks a lot for the help!  I'm
still interested in getting the best performance possible.  So if you think
fixing the xciever problem will help, I'd like to spend some more time


On 12/5/09 9:38 PM, "stack" <stack@duboce.net> wrote:

> See http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A6.  Different hdfs
> complaint but make sure your ulimit is > 1024 (check first or second line in
> master log -- it prints out what hbase is seeing for ulimit), check that
> hdfs-127 is applied to the first hadoop that hbase sees on CLASSPATH (this
> is particularly important if your loading script is a mapreduce task,
> clients might not be seeing the patched hadoop that hbase ships with).  Also
> up the handler count for hdfs (the referred to timeout is no longer
> pertinent I believe) and while you are at it, those for hbase if you haven't
> changed them from defaults.  While you are at it, make sure you don't suffer
> from http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5.
> How many regions per regionserver?
> Can you put a regionserver log somewhere I can pull it to take a look?
> For a "Could not obtain block message", what happens if you take the
> filename -- 2540865741541403627 in the below -- and grep NameNode.  Does it
> tell you anything?
> St.Ack
> On Sat, Dec 5, 2009 at 3:32 PM, Adam Silberstein
> <silberst@yahoo-inc.com>wrote:
>> Hi,
>> I¹m having problems doing client operations when my table is large.  I did
>> an initial test like this:
>> 6 servers
>> 6 GB heap size per server
>> 20 million 1K recs (so ~3 GB per server)
>> I was able to do at least 5,000 random read/write operations per second.
>> I think increased my table size to
>> 120 million 1K recs (so ~20 GB per server)
>> I then put a very light load of random reads on the table: 20 reads per
>> second.  I¹m able to do a few, but within 10-20 seconds, they all fail.  I
>> found many errors of the following type in the hbase master log:
>> java.io.IOException: java.io.IOException: Could not obtain block:
>> blk_-7409743019137510182_39869
>> file=/hbase/.META./1028785192/info/2540865741541403627
>> If I wait about 5 minutes, I can repeat this sequence (do a few operations,
>> then get errors).
>> If anyone has any suggestions or needs me to list particular settings, let
>> me know.  The odd thing is that I observe no problems and great performance
>> with a smaller table.
>> Thanks,
>> Adam

View raw message