hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HBASE-684) unnecessary iteration in HMemcache.internalGet? got much better reading performance after break it.
Date Mon, 16 Jun 2008 19:23:45 GMT

     [ https://issues.apache.org/jira/browse/HBASE-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack resolved HBASE-684.
-------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 0.1.4)
                   0.1.3

Committed to branch to include in 0.1.3 release.

Thanks for the patch LN.

> unnecessary iteration in HMemcache.internalGet? got much better reading performance after
break it.
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-684
>                 URL: https://issues.apache.org/jira/browse/HBASE-684
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.1.2
>            Reporter: LN
>             Fix For: 0.1.3
>
>         Attachments: 684.patch
>
>
> hi stack:
> first thanks much to your authors, it's a great system.
> not sure, but i think the tail map iteration should break after 'itKey.matchesRowCol(key)'
return false. in HStore.HMemcache.internalGet. because the tail map is SortedMap too, and
keys matches the input 'key' should in the beginnging of the map.
> i created a patched version of the class for testing , found about 5x read performance
improving in my testcase.  
> comments here:
> 1. i reach to reviewing HStore.java, because bothered by terrible reading performance
using 0.1.2 release: ONE record per second. testing env: 4Gmem, 2*duo xeon 2G, 100k record
in test table, 100k bytes per record column, 1 column only.
> 2. i have seen PerformanceEvaluation pages in wiki, 1k bytes record reading performence
also acceptable in my testing env, but as the record size increasing, reading performance
go down so quickly.
> 3. when profiling hregionserver process, i found the first bottleneck is data io in MapFile,
this is the hbase.io.index.interval issue(HBASE-680) i posted yesterday.
> 4. after set hbase.io.index.interval to 1, reading performance improved much, but not
enough(i thinks it should be Nx hadoop reading performance, where N<10), this time profiling
show HMemcache.internalGet used much cpu time, and each row get will calling about 200 times
HStoreKey#matchesRowCol, in my test env.
> 5. applying my patched version, i got mucher better reading performance.  test case desc:
first inserting  100k records to a table, then random read  10000 from it.
> 6. this change tak no effect if no cache there, like regionserver refresh started, so
my test case insert rows first, but this is a normal situation that reading and writing in
same time.
> here is my simple patch:
> Index: src/java/org/apache/hadoop/hbase/HStore.java
> ===================================================================
> --- src/java/org/apache/hadoop/hbase/HStore.java	Fri Jun 13 00:15:59 CST 2008
> +++ src/java/org/apache/hadoop/hbase/HStore.java	Fri Jun 13 00:15:59 CST 2008
> @@ -478,11 +478,14 @@
>            if (!HLogEdit.isDeleted(es.getValue())) { 
>              result.add(tailMap.get(itKey));
>            }
> -        }
> -        if (numVersions > 0 && result.size() >= numVersions) {
> -          break;
> -        }
> +            if (numVersions > 0 && result.size() >= numVersions) {
> +              break;
> +            }
> +        }else
> +          { //by L.N., map is sorted, so we can't find match any more.
> +            break;
> -      }
> +          }
> +      }
>        return result;
>      }
> after all, i'd suggest a new hbase class for memory cache holder instead of synchronized
sorted map, this can lead to much better performance, basicly avoid iteration(if my thoughts
above is wrong), and remove many sync/lock unnecessary. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message