accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arshak Navruzyan <arsh...@gmail.com>
Subject ISAM file location vs. read performance
Date Sun, 12 Jan 2014 17:28:21 GMT
One aspect of Accumulo architecture is still unclear to me.  Would you
achieve better scan performance if you could guarantee that the tablet and
its ISAM file lived on the same node?  Guessing ISAM files are not
splittable so they pretty much stay on one HDFS data node (plus the replica
copy). Or is the theory that SATA and a 10GBps network provide more or less
the same throughput?

I generally understand that as the table grows and Accumulo creates more
splits (tablets) you get better distribution over the cluster but seems
like data location would still be important.   HBase folks seem to think
that you can approx. double your throughput if let the region server
directly read the file (dfs.client.read.shortcircuit=true) as opposed to
going through the data node. (
http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf).  Perhaps this is
due more to HDFS overhead?

I do get that one really nice thing about Accumulo's architecture is that
it costs almost nothing to reassign tablet to a different tserver and this
is a huge problem for other systems.

Mime
View raw message