accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Troxell <steven.trox...@gmail.com>
Subject Re: Accumulo Caching for benchmarking
Date Sat, 04 Aug 2012 17:21:47 GMT
thanks everyone, that should definately help me out,  while I feel silly
for ignoring this issue at first, it should be interesting to see how much
this influences the results.


On Sat, Aug 4, 2012 at 7:19 AM, Eric Newton <eric.newton@gmail.com> wrote:

> You can drop the OS caches between runs:
>
> # echo 1 > /proc/sys/vm/drop_caches
>
>
> On Fri, Aug 3, 2012 at 9:41 PM, Christopher Tubbs <ctubbsii@gmail.com>wrote:
>
>> Steve-
>>
>> I would probably design the experiment to test different cluster sizes
>> as completely independent. That means, taking the entire thing down
>> and back up again (possibly even rebooting the boxes, and/or
>> re-initializing the cluster at the new size). I'd also do several runs
>> while it is up at a particular cluster size, to capture any
>> performance difference between the first and a later run due to OS or
>> TServer caching, for analysis later.
>>
>> Essentially, when in doubt, take more data...
>>
>> --L
>>
>>
>> On Fri, Aug 3, 2012 at 5:50 PM, Steven Troxell <steven.troxell@gmail.com>
>> wrote:
>> > Hi  all,
>> >
>> > I am running a benchmarking project on accumulo looking at RDF queries
>> for
>> > clusters with different node sizes.   While I intend to look at caching
>> for
>> > each optimizing each individual run, I do NOT want caching to interfere
>> for
>> > example between runs involving the use of 10 and 8 tablet servers.
>> >
>> > Up to now I'd just been killing nodes via the bin/stop-here.sh script
>> but I
>> > realize that may have allowed caching from previous runs with different
>> node
>> > sizes to influence my results.   It seemed weird to me for exmaple when
>> I
>> > realized dropping nodes actually increased performance (as measured by
>> query
>> > return times) in some cases (though I acknowledge the code I'm working
>> with
>> > has some serious issues with how ineffectively it is actually utilizing
>> > accumulo, but that's an issue I intend to address later).
>> >
>> > I suppose one way would be between a change of node sizes,  stop and
>> restart
>> > ALL nodes ( as opposed to what I'd been doing in just killing 2 nodes
>> for
>> > example in transitioning from a 10 to 8 node test).  Will this be sure
>> to
>> > clear the influence of caching across runs, and is there any cleaner
>> way to
>> > do this?
>> >
>> > thanks,
>> > Steve
>>
>
>

Mime
View raw message