From user-return-55061-archive-asf-public=cust-asf.ponee.io@hbase.apache.org Thu Apr 12 18:16:47 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 58804180634 for ; Thu, 12 Apr 2018 18:16:47 +0200 (CEST) Received: (qmail 34607 invoked by uid 500); 12 Apr 2018 16:16:41 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 34591 invoked by uid 99); 12 Apr 2018 16:16:40 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Apr 2018 16:16:40 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id E81CA18000F for ; Thu, 12 Apr 2018 16:16:39 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.998 X-Spam-Level: * X-Spam-Status: No, score=1.998 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id PIwoXChEOkmX for ; Thu, 12 Apr 2018 16:16:35 +0000 (UTC) Received: from cn01-SHA-obe.outbound.protection.partner.outlook.cn (mail-shaon0108.outbound.protection.partner.outlook.cn [42.159.164.108]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 33DAA5FBAF for ; Thu, 12 Apr 2018 16:16:32 +0000 (UTC) Received: from SH2PR01MB185.CHNPR01.prod.partner.outlook.cn (10.41.249.139) by SH2PR01MB188.CHNPR01.prod.partner.outlook.cn (10.41.249.142) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.653.12; Thu, 12 Apr 2018 16:16:07 +0000 Received: from SH2PR01MB185.CHNPR01.prod.partner.outlook.cn ([10.41.249.139]) by SH2PR01MB185.CHNPR01.prod.partner.outlook.cn ([10.41.249.139]) with mapi id 15.20.0653.018; Thu, 12 Apr 2018 16:16:07 +0000 From: "Liu, Ming (Ming)" To: "user@hbase.apache.org" Subject: how to get random rows from a big hbase table faster Thread-Topic: how to get random rows from a big hbase table faster Thread-Index: AdPSXzKDS4El32chSXOGJ3vpSsbkWQ== Date: Thu, 12 Apr 2018 16:16:07 +0000 Message-ID: Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=ming.liu@esgyn.cn; x-originating-ip: [58.37.38.158] x-ms-publictraffictype: Email x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(4600240)(4652020)(4603075)(4627221)(201702281549075)(7153060)(4601075);SRVR:SH2PR01MB188; x-ms-traffictypediagnostic: SH2PR01MB188: x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(801097)(851020)(852020)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(944501327)(802075);SRVR:SH2PR01MB188;BCL:0;PCL:0;RULEID:(803075);SRVR:SH2PR01MB188; x-forefront-prvs: 06400060E1 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(346002)(39840400004)(366004)(396003)(376002)(199004)(329002)(328002)(189003)(33656002)(6116002)(63696004)(316002)(2900100001)(790700001)(476003)(1730700003)(74482002)(6306002)(186003)(55016002)(105586002)(81612004)(97736004)(3846002)(8676002)(5640700003)(26005)(2501003)(54896002)(68736007)(77096007)(6916009)(2351001)(8936002)(81156014)(106356001)(3660700001)(5660300001)(9686003)(486006)(7736002)(66066001)(102836004)(52396003)(7696005)(95416001)(86362001)(7330300002)(478600001)(3280700002)(14454004);DIR:OUT;SFP:1101;SCL:1;SRVR:SH2PR01MB188;H:SH2PR01MB185.CHNPR01.prod.partner.outlook.cn;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;A:1;MX:1; received-spf: None (protection.outlook.com: esgyn.cn does not designate permitted sender hosts) Content-Type: multipart/alternative; boundary="_000_SH2PR01MB185C5884066FD771105F8B4ECBC0SH2PR01MB185CHNPR0_" MIME-Version: 1.0 X-MS-Office365-Filtering-Correlation-Id: e2bb8b92-03f3-44e1-9c4c-08d5a090b2c3 X-OriginatorOrg: esgyn.cn X-MS-Exchange-CrossTenant-Network-Message-Id: e2bb8b92-03f3-44e1-9c4c-08d5a090b2c3 X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Apr 2018 16:16:07.6598 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 8bf80453-2b55-4b23-a365-dc866d5806aa X-MS-Exchange-Transport-CrossTenantHeadersStamped: SH2PR01MB188 --_000_SH2PR01MB185C5884066FD771105F8B4ECBC0SH2PR01MB185CHNPR0_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi, all, We have a hbase table which has 1 billion rows, and we want to randomly get= 1M from that table. We are now trying the RandomRowFilter, but it is still= very slow. If I understand it correctly, in the Server side, RandomRowFilt= er still need to read all 1 billions but return randomly 1% for them. But r= ead 1 billion rows is very slow. Is this true? So is there any other better way to randomly get 1% rows from a given table= ? Any idea will be very appreciated. We don't know the distribution of the 1 billion rows in advance. Thanks, Ming --_000_SH2PR01MB185C5884066FD771105F8B4ECBC0SH2PR01MB185CHNPR0_--