hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elliott Clark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6852) SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of its fields
Date Fri, 21 Sep 2012 20:04:07 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13460786#comment-13460786
] 

Elliott Clark commented on HBASE-6852:
--------------------------------------

bq.Aggregating stuff locally and pushing to metrics seems ideal
With that comes a lot of book keeping and potential places to leak memory(if we use strong
references) or to lose metrics data (if we use weak references). I'm not sure that the perf
gain will be high enough to justify that. 

Since we already shim a lot to the metrics2 classes it seems like using the high-scale-lib
counters to create conurrent versions of the MetricMutableCounter{Long|Int} would stop most
cache contention pretty easily.  For me these seem like the order of cost vs benefit:
# Aggregating metrics locally before pushing to the metrics system whenever possible
# Using the hashmap less (This is already happening in the metrics2 move over. See [MasterMetricsSourceImpl|https://github.com/apache/hbase/blob/trunk/hbase-hadoop1-compat/src/main/java/org/apache/hadoop/hbase/master/metrics/MasterMetricsSourceImpl.java]
for how known metrics are staying away from the hashmap)
# Changing  metrics to use counters rather than time varying rate wherever possible (Lots
less locking if we don't need to keep min/max)
# Create CliffClick versions of Counters and use them whenever there's concurrent access
# Look at ThreadLocal caches versions of metrics.
                
> SchemaMetrics.updateOnCacheHit costs too much while full scanning a table with all of
its fields
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-6852
>                 URL: https://issues.apache.org/jira/browse/HBASE-6852
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>    Affects Versions: 0.94.0
>            Reporter: Cheng Hao
>            Priority: Minor
>              Labels: performance
>             Fix For: 0.94.3, 0.96.0
>
>         Attachments: onhitcache-trunk.patch
>
>
> The SchemaMetrics.updateOnCacheHit costs too much while I am doing the full table scanning.
> Here is the top 5 hotspots within regionserver while full scanning a table: (Sorry for
the less-well-format)
> CPU: Intel Westmere microarchitecture, speed 2.262e+06 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00
(No unit mask) count 5000000
> samples  %        image name               symbol name
> -------------------------------------------------------------------------------
> 98447    13.4324  14033.jo                 void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
boolean)
>   98447    100.000  14033.jo                 void org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics.updateOnCacheHit(org.apache.hadoop.hbase.io.hfile.BlockType$BlockCategory,
boolean) [self]
> -------------------------------------------------------------------------------
> 45814     6.2510  14033.jo                 int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[],
int, int, byte[], int, int)
>   45814    100.000  14033.jo                 int org.apache.hadoop.hbase.KeyValue$KeyComparator.compareRows(byte[],
int, int, byte[], int, int) [self]
> -------------------------------------------------------------------------------
> 43523     5.9384  14033.jo                 boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
>   43523    100.000  14033.jo                 boolean org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(org.apache.hadoop.hbase.KeyValue)
[self]
> -------------------------------------------------------------------------------
> 42548     5.8054  14033.jo                 int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[],
int, int, byte[], int, int)
>   42548    100.000  14033.jo                 int org.apache.hadoop.hbase.KeyValue$KeyComparator.compare(byte[],
int, int, byte[], int, int) [self]
> -------------------------------------------------------------------------------
> 40572     5.5358  14033.jo                 int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1
>   40572    100.000  14033.jo                 int org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.binarySearchNonRootIndex(byte[],
int, int, java.nio.ByteBuffer, org.apache.hadoop.io.RawComparator)~1 [self]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message