Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 24812200C48 for ; Thu, 6 Apr 2017 11:05:46 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 206FF160BA6; Thu, 6 Apr 2017 09:05:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7023A160B84 for ; Thu, 6 Apr 2017 11:05:45 +0200 (CEST) Received: (qmail 42632 invoked by uid 500); 6 Apr 2017 09:05:43 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 42621 invoked by uid 99); 6 Apr 2017 09:05:43 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Apr 2017 09:05:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 770011A7AAA for ; Thu, 6 Apr 2017 09:05:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 2jtx_nxCq-ho for ; Thu, 6 Apr 2017 09:05:42 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 6EE0D5FC90 for ; Thu, 6 Apr 2017 09:05:42 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 12264E0A6C for ; Thu, 6 Apr 2017 09:05:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id C1F292406A for ; Thu, 6 Apr 2017 09:05:41 +0000 (UTC) Date: Thu, 6 Apr 2017 09:05:41 +0000 (UTC) From: "ramkrishna.s.vasudevan (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-17849) PE tool random read is not totally random MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 06 Apr 2017 09:05:46 -0000 [ https://issues.apache.org/jira/browse/HBASE-17849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-17849: ------------------------------------------- Status: Open (was: Patch Available) > PE tool random read is not totally random > ----------------------------------------- > > Key: HBASE-17849 > URL: https://issues.apache.org/jira/browse/HBASE-17849 > Project: HBase > Issue Type: Bug > Components: test > Affects Versions: 2.0.0 > Reporter: ramkrishna.s.vasudevan > Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: HBASE-17849.patch, HBASE-17849.patch > > > Recently we were using the PE tool for doing some bucket cache related performance tests. One thing that we noted was that the way the random read works is not totally random. > Suppose we load 200G of data using --size param and then we use --rows=500000 to do the randomRead. The assumption was among the 200G of data it could generate randomly 500000 row keys to do the reads. > But it so happens that the PE tool generates random rows only on those set of row keys which falls under the first 500000 rows. > This was quite evident when we tried to use HBASE-15314 in our testing. Suppose we split the bucket cache of size 200G into 2 files each 100G the randomReads with --rows=500000 always lands in the first file and not in the 2nd file. Better to make PE purely random. -- This message was sent by Atlassian JIRA (v6.3.15#6346)