accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-112) Investigate partitioning in memory map by locality group
Date Sat, 27 Jul 2013 00:43:48 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721444#comment-13721444
] 

Keith Turner commented on ACCUMULO-112:
---------------------------------------

I have a working version of this [on github|https://github.com/keith-turner/accumulo/tree/ACCUMULO-112].
 It still needs some polishing and test, but its mostly done.  I ran the same test I ran earlier
with 32K rows each having 32 column families.  I used snappy for this test.  The column "Scan
One CF" is the time it took to read one of the 32 column families.   The column "Scan CF:CQ"
is the time it took to read one column family and one column qualifier.  This scan usually
returned 0 to 2 entries.

||Num Locality Groups||Write Time||Scan One CF Time||Scan CF:CQ Time||Flush Time||
|1|2.21 secs|0.99 secs|0.79 secs|2.07 secs|
|2|2.27 secs|0.71 secs|0.51 secs|2.19 secs|
|4|2.35 secs|0.43 secs|0.20 secs|2.21 secs|
|8|2.48 secs|0.33 secs|0.12 secs|2.33 secs|
|16|2.86 secs|0.26 secs|0.07 secs|2.56 secs|
|32|3.85 secs|0.24 secs|0.05 secs|2.85 secs|

Below the data is normalized per column.  Each cell is divided by the minimum in its column.


||Num Locality Groups||Write Time||Scan One CF Time||Scan CF:CQ Time||Flush Time||
|1|1.00|4.13|15.80|1.00|
|2|1.03|2.96|10.20|1.06|
|4|1.06|1.79|4.00|1.07|
|8|1.12|1.38|2.40|1.13|
|16|1.29|1.08|1.40|1.24|
|32|1.74|1.00|1.00|1.38|


                
> Investigate partitioning in memory map by locality group
> --------------------------------------------------------
>
>                 Key: ACCUMULO-112
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-112
>             Project: Accumulo
>          Issue Type: Task
>          Components: tserver
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>              Labels: gsoc2013, mentor
>
> Currently the in memory map is not partitioned by locality group.  This could negatively
impact scan and minor compaction performance.    Would like to run some experiments to understand
the performance implications.  Partitioning by locality group could negatively impact insert
performance, it could go from O(log(R)+log(C))  to O(L * (log(R)+log(C))) in the worst case.
 L is the number of locality groups, R is the number of rows and C is the number of columns.
 The worst case is where each mutation has a change for each locality group. 
> Currently the in memory map is a map of maps.  Like the following.
> {noformat}
>   map<row, map<col, val>>
> {noformat}
> Could conceptually change this to one of the following.  The first is best for scans,
that access some locality groups, and minor compactions.  The second is good for inserts where
the mutation covers all locality groups, because the row is only looked up once.
> {noformat}
>   map<localityGroup, map<row, map<col, val>>>
> {noformat}
> {noformat}
>   map<row, map<localityGroup, map<col, val>>>
> {noformat}
> The Accumulo native map is implemented using C++,STL, JNI, and with thread locking in
java.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message