accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Marion" <>
Subject RE: Straggler problem in Accumulo BatchScans
Date Wed, 21 Aug 2013 23:28:45 GMT
How is the table organized?

What percent of the table are you scanning in these large operations?

Have you considered writing a custom load balancer?


I don't think that a tablet can be hosted on multiple servers. But you might
be able to play around with the index/data caches, readahead threads
(concurrent queries), and max open files to achieve better performance.


From: Slater, David M. [] 
Sent: Wednesday, August 21, 2013 7:09 PM
Subject: Straggler problem in Accumulo BatchScans


Hey, I have a 7 node network running accumulo 1.4.1 and hadoop 1.0.4.


When I run large BatchScanner operations, the number of tablets scanned per
node is not uniform, leading to the overloaded nodes taking much longer to
finish than the others. For queries that require all of the scans to finish
before returning, this is a major latency issue. What are some practical
means of load-balancing this to reduce delay?


Is it possible for tablets to be hosted on multiple tablet servers, up to
the replication factor of the underlying hdfs? Are there reasons this might
be an undesirable design?


Thanks in advance,

View raw message