hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Keller <brya...@gmail.com>
Subject Re: Is hadoop 1.0.0 + HBase 0.90.5 the best combination for production cluster?
Date Fri, 17 Feb 2012 21:48:41 GMT
I was thinking (wrongly it seems) that having the region server read directly from the local
file system would be faster than going through the data node, even with sequential access.

On Feb 17, 2012, at 1:28 PM, Jean-Daniel Cryans wrote:

> On Fri, Feb 17, 2012 at 1:21 PM, Bryan Keller <bryanck@gmail.com> wrote:
>> I have been experimenting with local reads. For me, enabling did not help improve
read performance at all, I get the same performance either way. I can see in the data node
logs it is passing back the local path, so it is enabled properly.
> I was surprised when I read this until I saw this:
>> Perhaps the benefits of local reads are dependent on the type of data and the workload?
In my test I'm scanning through the entire table via a map reduce job. It's a wide table with
maybe 20k columns per row on average. I have scanner caching set to 10.
> It's definitely not going to help make sequential reads faster.
>> My read performance is about 10% of the disk max read throughput, i.e. my disks can
get 100 mb/sec tested with hdparm and scan performance is about 10 mb/sec. Not too bad I suppose.
> Maybe you're not pushing it enough?
> J-D

View raw message