hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17849) PE tool randomness is not totally random
Date Tue, 04 Apr 2017 06:53:41 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954661#comment-15954661

ramkrishna.s.vasudevan commented on HBASE-17849:

By specifying --size=200 and --rows=500000, I have been able to reduce the test run to 6 mins
and also evenly spread the reads across two files configured for bucket cache - whereas previously
specifying --size=200 ran for around 8 hours and --rows=500000 ran quickly but was not able
to make use of the both files configured for bucket cache.

> PE tool randomness is not totally random
> ----------------------------------------
>                 Key: HBASE-17849
>                 URL: https://issues.apache.org/jira/browse/HBASE-17849
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.0.0
>            Reporter: ramkrishna.s.vasudevan
>             Fix For: 2.0.0
>         Attachments: HBASE-17849.patch
> Recently we were using the PE tool for doing some bucket cache related performance tests.
One thing that we noted was that the way the random read works is not totally random.
> Suppose we load 200G of data using --size param and then we use --rows=500000 to do the
randomRead. The assumption was among the 200G of data it could generate randomly 500000 row
keys to do the reads.
> But it so happens that the PE tool generates random rows only on those set of row keys
which falls under the first 500000 rows. 
> This was quite evident when we tried to use HBASE-15314 in our testing. Suppose we split
the bucket cache of size 200G into 2 files each 100G the randomReads with --rows=500000 always
lands in the first file and not in the 2nd file. Better to make PE purely random.

This message was sent by Atlassian JIRA

View raw message