Return-Path: X-Original-To: apmail-accumulo-dev-archive@www.apache.org Delivered-To: apmail-accumulo-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0DCF4181A0 for ; Mon, 16 Nov 2015 15:54:12 +0000 (UTC) Received: (qmail 45398 invoked by uid 500); 16 Nov 2015 15:54:11 -0000 Delivered-To: apmail-accumulo-dev-archive@accumulo.apache.org Received: (qmail 45356 invoked by uid 500); 16 Nov 2015 15:54:11 -0000 Mailing-List: contact dev-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list dev@accumulo.apache.org Delivered-To: moderator for dev@accumulo.apache.org Received: (qmail 11154 invoked by uid 99); 16 Nov 2015 15:35:21 -0000 X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.536 X-Spam-Level: ** X-Spam-Status: No, score=2.536 tagged_above=-999 required=6.31 tests=[FREEMAIL_ENVFROM_END_DIGIT=0.25, SPF_SOFTFAIL=0.972, URIBL_BLOCKED=0.001, URI_HEX=1.313] autolearn=disabled Date: Mon, 16 Nov 2015 08:35:11 -0700 (MST) From: z11373 To: dev@accumulo.apache.org Message-ID: <1447688111824-15569.post@n5.nabble.com> Subject: delete rows test result MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Last week on separate thread I was suggested to use tableOperations.deleteRows for deleting rows that matched with specific ranges. So I was curious to try it out to see if it's better than my current implementation which is iterating all rows, and call putDelete for each. While researching, I also found Accumulo already provides BatchDeleter, which also does the same thing. I tried all of three, and below is my test results against three different tables (numbers are in milliseconds): Test 1 (using iterator and call putDelete for each): Table 1: 5,702 Table 2: 6,912 Table 3: 4,694 Test 2 (using BatchDeleter class): Table 1: 8,089 Table 2: 10,405 Table 3: 7,818 Test 3 (using tableOperations.deleteRows, note that I first iterate all rows, just to get the last row id, which then being passed as argument to the function): Table 1: 196,597 Table 2: 226,496 Table 3: 8,442 I ran the tests few times, and pretty much got the consistent results above. I didn't look at the code what deleteRows really doing, but looking at my test results, I can say it sucks! Note that for that test, I did scan and iterate just to get the last row id, but even I subtract the time for doing that, it's still way too slow. Therefore, I'd recommend anyone to avoid using deleteRows for this scenario. YMMV, but I'd stick with my original approach, which is doing the same like Test 1 above. Thanks, Z -- View this message in context: http://apache-accumulo.1065345.n5.nabble.com/delete-rows-test-result-tp15569.html Sent from the Developers mailing list archive at Nabble.com.