hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "nkeywal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-1938) Make in-memory table scanning faster
Date Tue, 26 Jul 2011 16:21:09 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071196#comment-13071196

nkeywal commented on HBASE-1938:

I have an improvement that could make a real difference.

In Hbase, there is an iterator called MapEntryIterator, that acts in reality as a ValueIterator
{noformat}static class MapEntryIterator implements Iterator<KeyValue>
    private final Iterator<Map.Entry<KeyValue, KeyValue>> iterator;

    public KeyValue next() {
      return this.iterator.next().getValue();

However, with the current implementation of the JDK, there is an important difference between
an iterator on values and an iterator on entries. From java.util.concurrent we can see:

The ValueIterator is straighforward:
    {noformat}final class ValueIterator extends Iter<V> {
        public V next() {
            V v = nextValue;
            return v;

While there is some defensive programming taking place for the EntryIterator, with the creation
of an immutable object. 
    {noformat}final class EntryIterator extends Iter<Map.Entry<K,V>> {
        public Map.Entry<K,V> next() {
            Node<K,V> n = next;
            V v = nextValue;
            return new AbstractMap.SimpleImmutableEntry<K,V>(n.key, v);

As a consequence, there is at least one object creation for every line in the hbase scanner.
This creation is actually useless as we throw away the object immediatly. So, during the test
several GC occur. I modified the MapEntryIterator implementation to iterate on the values.

{noformat}static class MapEntryIterator implements Iterator<KeyValue> {
    private final Iterator<KeyValue> iterator;

    public KeyValue next() {
      return this.iterator.next();

The scan time is divided by 3 on the test. It can obviously be put to any arbitrary improvement
ratio as it's driven by the GC execution, but it should be valuable in production as well.

I am currently running the unit tests, I will add the patch if the execution is ok.

> Make in-memory table scanning faster
> ------------------------------------
>                 Key: HBASE-1938
>                 URL: https://issues.apache.org/jira/browse/HBASE-1938
>             Project: HBase
>          Issue Type: Improvement
>          Components: performance
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: MemStoreScanPerformance.java, MemStoreScanPerformance.java, caching-keylength-in-kv.patch,
> This issue is about profiling hbase to see if I can make hbase scans run faster when
all is up in memory.  Talking to some users, they are seeing about 1/4 million rows a second.
 It should be able to go faster than this (Scanning an array of objects, they can do about
4-5x this).

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message