Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6C336200C2A for ; Wed, 1 Mar 2017 14:46:24 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 6AD78160B70; Wed, 1 Mar 2017 13:46:24 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B0B01160B5E for ; Wed, 1 Mar 2017 14:46:23 +0100 (CET) Received: (qmail 86443 invoked by uid 500); 1 Mar 2017 13:46:21 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 86429 invoked by uid 99); 1 Mar 2017 13:46:21 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Mar 2017 13:46:21 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id BD5101A7ABE for ; Wed, 1 Mar 2017 13:46:20 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.379 X-Spam-Level: ** X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 34OXJKqdtV0t for ; Wed, 1 Mar 2017 13:46:19 +0000 (UTC) Received: from mail-oi0-f50.google.com (mail-oi0-f50.google.com [209.85.218.50]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id E3FF55F254 for ; Wed, 1 Mar 2017 13:46:18 +0000 (UTC) Received: by mail-oi0-f50.google.com with SMTP id m124so22346017oig.1 for ; Wed, 01 Mar 2017 05:46:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=1Ho21RBGyb+qS+XZbkkbiqEIrERfrDO8A9bWPurVRcs=; b=kOQ6UgVQIczHlAaaIpEw6D4oaL8d+F57f+0aztZdA0u5gVMJ8oE/rCXM9seEX2B6g2 ZO9zmx9Z+xQGfnXb8iyQfqlYRFdDo/d0VZ1Zsor0hrkK7T0ofb//TKaWu3tkG7x50kHW QPtwZn8FRjoTPp1gCBg/NdmnTNik0y3zG48yw8qfUPcQLuuebw3gv/boOT7lpDUpr16Y smKIRzbCfYRNiLihj4RH6Ra2hQVg9BVrEQsN6c2ijrm5OAK6D4yKayl7n1BsRc/tkngI YVNYu7aColG+KWRFBOmtgupFu2xndsQa3h2ojGdKAZKIA9+u3BDUZSWrSz1huN5J+plL T2PQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=1Ho21RBGyb+qS+XZbkkbiqEIrERfrDO8A9bWPurVRcs=; b=eSriG2IOiB29wyQI2mU6DmOFCaoc1uEy1K4XCHd/MKhyXK3L0zuCtsFRjOtX0gzOAZ MjUeeFcY5y+qgK6DAODOJQrR1HUNi5E/4mTkb16CPDHSykHNgdBs7sgF3o0X58VCZos+ KLHO4ziRct8vyPOAUS7hCjLQs4QgOwtdLFEY99/AfHBGZpocOMAEOs4UAYxbyS0sOVof IjrXFzArwVKeREHp+KqoY1/YNQQN/g1pdjZI/qnDVXP3FsrfSz+OlOHBm/Fdj24djcjQ rBuHLhNC1Lqqj9wJLI0y4QJ1bOS8AIanvYmhDbxRNq0gqLpP5A1EOjeSerqpNf4YaMEV xbEg== X-Gm-Message-State: AMke39nSXat6Lkxk4AdIP7QeXepmU6f8fN9E/2yhY+8eLnS1KVENyBQShmzJ1yJMHL2WKr9uc+NEig6rD7PdZQ== X-Received: by 10.202.0.20 with SMTP id 20mr4358566oia.11.1488375978295; Wed, 01 Mar 2017 05:46:18 -0800 (PST) MIME-Version: 1.0 Received: by 10.182.105.170 with HTTP; Wed, 1 Mar 2017 05:46:17 -0800 (PST) In-Reply-To: <432CA3AF-2ABC-4329-97FD-590622A4C065@gmail.com> References: <432CA3AF-2ABC-4329-97FD-590622A4C065@gmail.com> From: Hef Date: Wed, 1 Mar 2017 21:46:17 +0800 Message-ID: Subject: Re: HBase scan returns inconsistent results on multiple runs for same dataset To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=001a1137942e7a8ab70549ab8973 archived-at: Wed, 01 Mar 2017 13:46:24 -0000 --001a1137942e7a8ab70549ab8973 Content-Type: text/plain; charset=UTF-8 I'm using CDH 5.9, the document show its HBase version is hbase-1.2.0+cdh5.9.1+222. ( https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_59.html ) I have no idea if HBASE-15378 is included. On Wed, Mar 1, 2017 at 9:33 PM, Ted Yu wrote: > Which hbase version are you using ? > > Does it include HBASE-15378 ? > > > On Mar 1, 2017, at 5:02 AM, Hef wrote: > > > > Hi, > > I'm encountering a strange behavior on MapReduce when using HBase as > input > > format. I run my MR tasks on a same table, same dataset, with a same > > pattern of Fuzzy Row Filter, multiple times. The Input Records counters > > shown are not consistent, the smallest number can be 40% less than the > > largest one. > > > > More specifically, > > - the table is split into 18 regions, distributed on 3 region server. The > > TTL is set to 10 days for the record, though the dataset for MR only > > includes those inserted in 7days. > > > > - The row key is defined as: > > sault(1byte) + time_of_hour(4bytes) + uuid(36bytes) > > > > > > - The scan is created as below: > > > > Scan scan = new Scan(); > > scan.setBatch(100); > > scan.setCaching(10000); > > scan.setCacheBlocks(false); > > scan.setMaxVersions(1); > > > > > > And the row filter for the scan is a FuzzyRowFilter that filters only > > events of a given time_of_hour. > > > > Everything looks fine while the result is out of expect. > > A same task runs 10 times, the Input Records counters show 6 different > > numbers, and the final output shows 6 different results. > > > > Does anyone has every faced this problem before? > > What could be the cause of this inconsistency of HBase scan result? > > > > Thanks > --001a1137942e7a8ab70549ab8973--