accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Troxell <>
Subject Accumulo Caching for benchmarking
Date Fri, 03 Aug 2012 21:50:35 GMT
Hi  all,

I am running a benchmarking project on accumulo looking at RDF queries for
clusters with different node sizes.   While I intend to look at caching for
each optimizing each individual run, I do NOT want caching to interfere for
example between runs involving the use of 10 and 8 tablet servers.

Up to now I'd just been killing nodes via the bin/ script but I
realize that may have allowed caching from previous runs with different
node sizes to influence my results.   It seemed weird to me for exmaple
when I realized dropping nodes actually increased performance (as measured
by query return times) in some cases (though I acknowledge the code I'm
working with has some serious issues with how ineffectively it is actually
utilizing accumulo, but that's an issue I intend to address later).

I suppose one way would be between a change of node sizes,  stop and
restart ALL nodes ( as opposed to what I'd been doing in just killing 2
nodes for example in transitioning from a 10 to 8 node test).  Will this be
sure to clear the influence of caching across runs, and is there any
cleaner way to do this?


View raw message