hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1938) Make in-memory table scanning faster
Date Fri, 06 Nov 2009 21:29:32 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774415#action_12774415
] 

stack commented on HBASE-1938:
------------------------------

Some reputable lads have been looking at in-memory scanning and have turned up some interesting
observations:

+ Getting values out of the block cache is way faster than getting them out of MemStore --
even though the values are prefabricated as KVs in MemStore and have to be instantiated when
pulling from the block cache.
+ Getting values out of the block cache is way faster than getting values out of preloaded
custom-MemStore that sits in front of our current MemStore -- with its handling of snapshot.
+ Getting values out of the block cache is faster than getting values from a plain Set of
KVs (!!!)

"I've noticed that the MemStoreScanner implementation is very inefficient: it basically does
a search for each row since it calls tailSet. IMO this should be changed - memstore snapshots
should be handles as a special case and not slow down all scans."

Getting rid of the tailsetting -- keeping the tailset for life of scan -- improved MemStore
performance by 40%.

Other observations:

+ We copy row out of KV to do compares more than once during processing of a Scan (Add a counter
to see for sure). Maybe we can avoid above by using a Comparator that can do the compare in
place?
+ Looking for equality, it might be faster comparing from end to start rather than from start
to end since more often, its in the tail the difference is:

{code}
   public static boolean equals(final byte [] left, final byte [] right) {
     // Could use Arrays.equals?
-    return left == null && right == null? true:
-      (left == null || right == null || (left.length != right.length))? false:
-        compareTo(left, right) == 0;
+    if( left == null && right == null? true:
+      (left == null || right == null || (left.length != right.length))? false: true) {
+    // compre from end to start since we usually compare 'close' bytes
+    int last = left.length-1;
+    for (int i=last; i>=0; i--){
+      if (left[i] != right[i]) {
+        return false;
+      }
+    }
+      return true;
+    }
+    return false;
   }
{code}


> Make in-memory table scanning faster
> ------------------------------------
>
>                 Key: HBASE-1938
>                 URL: https://issues.apache.org/jira/browse/HBASE-1938
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.21.0
>
>
> This issue is about profiling hbase to see if I can make hbase scans run faster when
all is up in memory.  Talking to some users, they are seeing about 1/4 million rows a second.
 It should be able to go faster than this (Scanning an array of objects, they can do about
4-5x this).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message