accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: ISAM file location vs. read performance
Date Sun, 12 Jan 2014 23:42:01 GMT


On 1/12/14, 6:17 PM, Sean Busbey wrote:
> On Sun, Jan 12, 2014 at 4:42 PM, William Slacum
> <wilhelm.von.cloud@accumulo.net <mailto:wilhelm.von.cloud@accumulo.net>>
> wrote:
>
>     Some data on short circuit reads would be great to have.
>
>
> What kind of data are you looking for? Just HDFS read rates? or
> specifically Accumulo when set up to make use of it?

I believe what Bill means, and what I'm also curious about, is 
specifically the impact on performance for Accumulo's workload: a merged 
read over multiple files. An easy test might be to create multiple 
RFiles (1 to 10 files?) which contain interspersed data. Test some sort 
of random-read and random-seek+sequential-read workloads, from 1 to 10 
RFiles, and with shortcircuit reads on an off.

Perhaps a slightly more accurate test would be to up the compaction 
ratio on a table, and then bulk import them to a single table, and then 
just use the regular client API.

>     I'm unsure of how correct the "compaction leading to eventual
>     locality" postulation is. It seems, to me at least, that in the case
>     of a multi-block file, the file system would eventually try to
>     distribute those blocks rather than leave them all on a single host.
>
>
>
>
> I know in HBase set ups, it's common to either disable the HDFS Balancer
> or just disable for a namespace containing the part of the filesystem
> that handles HBase. Otherwise, when the blocks are moved off to other
> hosts you get performance degradation until compaction can happen again.
> I would expect the same thing ought to be done for Accumulo.

AFAIK, HBase also does a lot more in regards to assigning Tablets in 
regards to the blocks that serve them, no? To my knowledge, Accumulo 
doesn't do anything like this. I don't want users to think that 
disabling the HDFS balancer is a good idea for Accumulo unless we have 
actual evidence.

Mime
View raw message