Return-Path: X-Original-To: apmail-incubator-accumulo-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-accumulo-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1A60B96F0 for ; Wed, 2 Nov 2011 18:31:54 +0000 (UTC) Received: (qmail 20369 invoked by uid 500); 2 Nov 2011 18:31:54 -0000 Delivered-To: apmail-incubator-accumulo-dev-archive@incubator.apache.org Received: (qmail 20341 invoked by uid 500); 2 Nov 2011 18:31:54 -0000 Mailing-List: contact accumulo-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: accumulo-dev@incubator.apache.org Delivered-To: mailing list accumulo-dev@incubator.apache.org Received: (qmail 20331 invoked by uid 99); 2 Nov 2011 18:31:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Nov 2011 18:31:53 +0000 X-ASF-Spam-Status: No, hits=-2001.2 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Nov 2011 18:31:52 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id C166132EA11 for ; Wed, 2 Nov 2011 18:31:32 +0000 (UTC) Date: Wed, 2 Nov 2011 18:31:32 +0000 (UTC) From: "Keith Turner (Commented) (JIRA)" To: accumulo-dev@incubator.apache.org Message-ID: <1179064868.51903.1320258692793.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1072776917.50743.1320248492253.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (ACCUMULO-112) Investigate partitioning in memory map by locality group MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142395#comment-13142395 ] Keith Turner commented on ACCUMULO-112: --------------------------------------- I ran some test with random data. The data was of the following format : {noformat} <16 digit rand hex> <4 digit hex> <4 digit rand hex> <50 byte random value> {noformat} There were 32 column families, 0000 to 001f. For the experiment 32,768 rows with 32 columns were inserted, creating 1,048,576 entries. The number of locality groups were varied and minor compaction times were recorded. Column families were evenly divided among locality groups. Below are the minor compaction times. ||Num Locality Groups||Minor Compaction Time||Relative Time|| |1 (default LG)|3.5 secs|1.0| |4|6.4 secs|1.8| |8|9.4 secs|2.7| |16|16.4 secs|4.7| |32|30.2 secs|8.6| Since the data was written to an unpartitioned in memory map, the insert times should have been the same. Once the in memory map is partitioned, it would be useful to track ingest time and minor compaction time. > Investigate partitioning in memory map by locality group > -------------------------------------------------------- > > Key: ACCUMULO-112 > URL: https://issues.apache.org/jira/browse/ACCUMULO-112 > Project: Accumulo > Issue Type: Task > Components: tserver > Reporter: Keith Turner > Assignee: Keith Turner > Fix For: 1.5.0 > > > Currently the in memory map is not partitioned by locality group. This could negatively impact scan and minor compaction performance. Would like to run some experiments to understand the performance implications. Partitioning by locality group could negatively impact insert performance, it could go from O(log(R)+log(C)) to O(L * (log(R)+log(C))) in the worst case. L is the number of locality groups, R is the number of rows and C is the number of columns. The worst case is where each mutation has a change for each locality group. > Currently the in memory map is a map of maps. Like the following. > {noformat} > map> > {noformat} > Could conceptually change this to one of the following. The first is best for scans, that access some locality groups, and minor compactions. The second is good for inserts where the mutation covers all locality groups, because the row is only looked up once. > {noformat} > map>> > {noformat} > {noformat} > map>> > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira