Return-Path: X-Original-To: apmail-accumulo-dev-archive@www.apache.org Delivered-To: apmail-accumulo-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F33DC18ABC for ; Mon, 23 Nov 2015 16:50:09 +0000 (UTC) Received: (qmail 71650 invoked by uid 500); 23 Nov 2015 16:50:04 -0000 Delivered-To: apmail-accumulo-dev-archive@accumulo.apache.org Received: (qmail 71602 invoked by uid 500); 23 Nov 2015 16:50:04 -0000 Mailing-List: contact dev-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list dev@accumulo.apache.org Delivered-To: moderator for dev@accumulo.apache.org Received: (qmail 57209 invoked by uid 99); 23 Nov 2015 16:43:47 -0000 X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.536 X-Spam-Level: ** X-Spam-Status: No, score=2.536 tagged_above=-999 required=6.31 tests=[FREEMAIL_ENVFROM_END_DIGIT=0.25, SPF_SOFTFAIL=0.972, URIBL_BLOCKED=0.001, URI_HEX=1.313] autolearn=disabled Date: Mon, 23 Nov 2015 09:43:36 -0700 (MST) From: z11373 To: dev@accumulo.apache.org Message-ID: <1448297016224-15609.post@n5.nabble.com> In-Reply-To: <1447945792284-15597.post@n5.nabble.com> References: <1447688111824-15569.post@n5.nabble.com> <1447886635438-15592.post@n5.nabble.com> <1447886871202-15593.post@n5.nabble.com> <1447945792284-15597.post@n5.nabble.com> Subject: Re: delete rows test result MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi William, I re-ran the same test calling deleteRows without scanning the table (so it's only timing the deleteRows operation here), and you're right, it's faster as shown in the result below. Table 1: 3,301 Table 2: 3,184 Table 3: 2,635 It's definitely faster, as comparison to the fastest result I got by scanning the table and calling putDelete for each, in the result below. Table 1: 5,702 Table 2: 6,912 Table 3: 4,694 However, there is one case I didn't mention last time, which the table has summing combiner installed. So even it may have 1M rows, but actually it can have rows as many as 10M or beyond, which may explain why deleteRows can take longer. Still, it seems something wrong looking at my test result. Test 1 (using iterator and call putDelete for each): Table 4 (with summing combiner): 11,081 Test 2 (calling deleteRows): Table 4 (with summing combiner): 197,050 Last time I heard someone mentioned about compaction, so I was curious, and do following test to compact first before calling deleteRows (to see if it'd be faster), and here is the result: Compact on Table 4 (with summing combiner): 376,619 Call deleteRows on Table 4 (with summing combiner): 188,862 So given the result above, I'd say the table compaction doesn't help. Perhaps I did something wrong here. Therefore, it seems to me, for certain case (like this one) scanning table and calling putDelete for each, will perform better than calling deleteRows, does this make sense? Thanks, Z -- View this message in context: http://apache-accumulo.1065345.n5.nabble.com/delete-rows-test-result-tp15569p15609.html Sent from the Developers mailing list archive at Nabble.com.