Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BA063983F for ; Sat, 4 Aug 2012 01:00:07 +0000 (UTC) Received: (qmail 70049 invoked by uid 500); 4 Aug 2012 01:00:07 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 70022 invoked by uid 500); 4 Aug 2012 01:00:07 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 70014 invoked by uid 99); 4 Aug 2012 01:00:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Aug 2012 01:00:07 +0000 X-ASF-Spam-Status: No, hits=0.3 required=5.0 tests=FREEMAIL_REPLY,FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of josh.elser@gmail.com designates 209.85.216.48 as permitted sender) Received: from [209.85.216.48] (HELO mail-qa0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Aug 2012 00:59:59 +0000 Received: by qadz32 with SMTP id z32so26725qad.0 for ; Fri, 03 Aug 2012 17:59:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=K0Dpn//hSjl39dAPeXEEPMnl7JfNRU24WoPSIxn4ask=; b=NFAB09ZIEdyqsUG8R1tQwOQgFerp/9agG/emtjsjXQLEKyj8r4vPPIp/Dm2GRxRICB Do4Xy6mpL25Lt73W26vvXJX5tK5sJl/tRnOaG9r/ZA65U1XfOV7rBJm/FoEPnzLKvPUL XJlxWJFE6DHUkCupSQGQI42A356mUGw9bOiCcWPnih/WY94ePBaT/Qgi2CbCr8zVMv4c usFWEsKGp8UBE2AH7osIIbt3IHlgK/yUfS1n5rqjiY2Uy/wnmt3nbgHufcbQdL607BM5 +FRxn7CicZAfuW1mxPRJdtNL1Lk+owhmswts55TL3HVZgGhUeQyKhzvEiHgSQWMe3Pib VeMw== Received: by 10.224.190.136 with SMTP id di8mr5799416qab.88.1344041978069; Fri, 03 Aug 2012 17:59:38 -0700 (PDT) Received: from [192.168.2.19] (pool-173-69-170-178.bltmmd.fios.verizon.net. [173.69.170.178]) by mx.google.com with ESMTPS id eb10sm8650615qab.4.2012.08.03.17.59.37 (version=SSLv3 cipher=OTHER); Fri, 03 Aug 2012 17:59:37 -0700 (PDT) Message-ID: <501C7406.70803@gmail.com> Date: Fri, 03 Aug 2012 20:59:50 -0400 From: Josh Elser User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20120713 Thunderbird/14.0 MIME-Version: 1.0 To: user@accumulo.apache.org Subject: Re: Accumulo Caching for benchmarking References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit I remember listening to a Keith presentation about the testing against the multi-level RFile index which was introduced in 1.4.0. You also want to think about caching at the operating system level. I'm not entirely positive what Keith did to try to mitigate this, but I imagine writing a bunch of garbage from /dev/urandom out to disk should work. That, or you could actually reboot the nodes. On 8/3/2012 8:55 PM, William Slacum wrote: > Steve, I'm a little confused. The Rfile block cache is tied to a > TServer, so if you kill a node, its cache should go away. Are you > querying for the same data after you kill the node that hosted the > tablet which contained the data? Also, between runs, you could stop and > restart everything, thereby eliminating the cache. > > On Fri, Aug 3, 2012 at 5:50 PM, Steven Troxell > wrote: > > Hi all, > > I am running a benchmarking project on accumulo looking at RDF > queries for clusters with different node sizes. While I intend to > look at caching for each optimizing each individual run, I do NOT > want caching to interfere for example between runs involving the use > of 10 and 8 tablet servers. > > Up to now I'd just been killing nodes via the bin/stop-here.sh > script but I realize that may have allowed caching from previous > runs with different node sizes to influence my results. It seemed > weird to me for exmaple when I realized dropping nodes actually > increased performance (as measured by query return times) in some > cases (though I acknowledge the code I'm working with has some > serious issues with how ineffectively it is actually utilizing > accumulo, but that's an issue I intend to address later). > > I suppose one way would be between a change of node sizes, stop and > restart ALL nodes ( as opposed to what I'd been doing in just > killing 2 nodes for example in transitioning from a 10 to 8 node > test). Will this be sure to clear the influence of caching across > runs, and is there any cleaner way to do this? > > thanks, > Steve > >