hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thakrar, Jayesh" <jthak...@conversantmedia.com>
Subject RE: Rows per second for RegionScanner
Date Thu, 21 Apr 2016 14:24:02 GMT
Just curious - have you set the scanner caching to some high value - say 1000 (or even higher
in your small value case)?

The parameter is hbase.client.scanner.caching

You can read up on it - https://hbase.apache.org/book.html

Another thing, are you just looking for pure scan-read performance optimization?
Depending upon the table size you can also look into caching the table or not caching at all.

-----Original Message-----
From: hongbin ma [mailto:mahongbin@apache.org] 
Sent: Thursday, April 21, 2016 5:04 AM
To: user@hbase.apache.org
Subject: Rows per second for RegionScanner

​Hi, experts,

I'm trying to figure out how fast hbase can scan. I'm setting up the RegionScan in a endpoint
coprocessor so that no network overhead will be included. My average key length is 35 and
average value length is 5.

My test result is that if I warm all my interested blocks in the block cache, I'm only able
to scan around 300,000 rows per second per region (with endpoint I guess it's one thread per
region), so it's like getting15M data per second. I'm not sure if this is already an acceptable
number for HBase. The answers from you experts might help me to decide if it's worth to further
dig into tuning it.

thanks!






other info:

My hbase cluster is on 8 AWS m1.xlarge instance, with 4 CPU cores and 16G RAM. Each region
server is configured 10G heap size. The test HTable has 23 regions, one hfile per region (just
major compacted). There's no other resource contention when I ran the tests.

Attached is the HFile output of one of the region hfile:
=============================================
 hbase  org.apache.hadoop.hbase.io.hfile.HFile -m -s -v -f
/apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06
2016-04-21 09:16:04,091 INFO  [main] Configuration.deprecation:
hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2016-04-21 09:16:04,292 INFO  [main] util.ChecksumType: Checksum using
org.apache.hadoop.util.PureJavaCrc32
2016-04-21 09:16:04,294 INFO  [main] util.ChecksumType: Checksum can use org.apache.hadoop.util.PureJavaCrc32C
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/usr/hdp/2.2.9.0-3393/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/hdp/2.2.9.0-3393/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2016-04-21 09:16:05,654 INFO  [main] Configuration.deprecation:
fs.default.name is deprecated. Instead, use fs.defaultFS Scanning ->
/apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06
Block index size as per heapsize: 3640
reader=/apps/hbase/data/data/default/KYLIN_YMSGYYXO12/d42b9faf43eafcc9640aa256143d5be3/F1/30b8a8ff5a82458481846e364974bf06,
    compression=none,
    cacheConf=CacheConfig:disabled,

firstKey=\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x09\x00\x00\x00\x00\x00\x01\xF4/F1:M/0/Put,

lastKey=\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1F\x06-?\x0F"U\x00\x00\x03[^\xD9/F1:M/0/Put,
    avgKeyLen=35,
    avgValueLen=5,
    entries=160988965,
    length=1832309188
Trailer:
    fileinfoOffset=1832308623,
    loadOnOpenDataOffset=1832306641,
    dataIndexCount=43,
    metaIndexCount=0,
    totalUncomressedBytes=1831809883,
    entryCount=160988965,
    compressionCodec=NONE,
    uncompressedDataIndexSize=5558733,
    numDataIndexLevels=2,
    firstDataBlockOffset=0,
    lastDataBlockOffset=1832250057,
    comparatorClassName=org.apache.hadoop.hbase.KeyValue$KeyComparator,
    majorVersion=2,
    minorVersion=3
Fileinfo:
    DATA_BLOCK_ENCODING = FAST_DIFF
    DELETE_FAMILY_COUNT = \x00\x00\x00\x00\x00\x00\x00\x00
    EARLIEST_PUT_TS = \x00\x00\x00\x00\x00\x00\x00\x00
    MAJOR_COMPACTION_KEY = \xFF
    MAX_SEQ_ID_KEY = 4
    TIMERANGE = 0....0
    hfile.AVG_KEY_LEN = 35
    hfile.AVG_VALUE_LEN = 5
    hfile.LASTKEY =
\x00\x16\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1F\x06-?\x0F"U\x00\x00\x03[^\xD9\x02F1M\x00\x00\x00\x00\x00\x00\x00\x00\x04
Mid-key:
\x00\x12\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1D\x04_\x07\x89\x00\x00\x02l\x00\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00\x00\x00\x007|\xBE$\x00\x00;\x81
Bloom filter:
    Not present
Delete Family Bloom filter:
    Not present
Stats:
   Key length:
               min = 32.00
               max = 37.00
              mean = 35.11
            stddev = 1.46
            median = 35.00
              75% <= 37.00
              95% <= 37.00
              98% <= 37.00
              99% <= 37.00
            99.9% <= 37.00
             count = 160988965
   Row size (bytes):
               min = 44.00
               max = 55.00
              mean = 48.17
            stddev = 1.43
            median = 48.00
              75% <= 50.00
              95% <= 50.00
              98% <= 50.00
              99% <= 50.00
            99.9% <= 51.97
             count = 160988965
   Row size (columns):
               min = 1.00
               max = 1.00
              mean = 1.00
            stddev = 0.00
            median = 1.00
              75% <= 1.00
              95% <= 1.00
              98% <= 1.00
              99% <= 1.00
            99.9% <= 1.00
             count = 160988965
   Val length:
               min = 4.00
               max = 12.00
              mean = 5.06
            stddev = 0.33
            median = 5.00
              75% <= 5.00
              95% <= 5.00
              98% <= 6.00
              99% <= 8.00
            99.9% <= 9.00
             count = 160988965
Key of biggest row:
\x00\x0B\x00\x00\x00\x00\x00\x00\x00\x1F\x04\xDD:\x06\x00U\x00\x00\x00\x8DS\xD2
Scanned kv count -> 160988965




This email and any files included with it may contain privileged,
proprietary and/or confidential information that is for the sole use
of the intended recipient(s).  Any disclosure, copying, distribution,
posting, or use of the information contained in or attached to this
email is prohibited unless permitted by the sender.  If you have
received this email in error, please immediately notify the sender
via return email, telephone, or fax and destroy this original transmission
and its included files without reading or saving it in any manner.
Thank you.
Mime
View raw message