accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (Commented) (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-112) Investigate partitioning in memory map by locality group
Date Wed, 02 Nov 2011 18:31:32 GMT


Keith Turner commented on ACCUMULO-112:

I ran some test with random data.  The data was of the following format :

 <16 digit rand hex> <4 digit hex> <4 digit rand hex> <50 byte random

There were 32 column families, 0000 to 001f. 

For the experiment 32,768 rows with 32 columns were inserted, creating 1,048,576 entries.
 The number of locality groups were varied and minor compaction times were recorded.  Column
families were evenly divided among locality groups.  Below are the minor compaction times.

||Num Locality Groups||Minor Compaction Time||Relative Time||
|1 (default LG)|3.5 secs|1.0|
|4|6.4 secs|1.8|
|8|9.4 secs|2.7|
|16|16.4 secs|4.7|
|32|30.2 secs|8.6|

Since the data was written to an unpartitioned in memory map, the insert times should have
been the same.  Once the in memory map is partitioned, it would be useful to track ingest
time and minor compaction time.

> Investigate partitioning in memory map by locality group
> --------------------------------------------------------
>                 Key: ACCUMULO-112
>                 URL:
>             Project: Accumulo
>          Issue Type: Task
>          Components: tserver
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.5.0
> Currently the in memory map is not partitioned by locality group.  This could negatively
impact scan and minor compaction performance.    Would like to run some experiments to understand
the performance implications.  Partitioning by locality group could negatively impact insert
performance, it could go from O(log(R)+log(C))  to O(L * (log(R)+log(C))) in the worst case.
 L is the number of locality groups, R is the number of rows and C is the number of columns.
 The worst case is where each mutation has a change for each locality group. 
> Currently the in memory map is a map of maps.  Like the following.
> {noformat}
>   map<row, map<col, val>>
> {noformat}
> Could conceptually change this to one of the following.  The first is best for scans,
that access some locality groups, and minor compactions.  The second is good for inserts where
the mutation covers all locality groups, because the row is only looked up once.
> {noformat}
>   map<localityGroup, map<row, map<col, val>>>
> {noformat}
> {noformat}
>   map<row, map<localityGroup, map<col, val>>>
> {noformat}

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message