Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2A519200CEF for ; Mon, 4 Sep 2017 14:26:58 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 28D53160F7B; Mon, 4 Sep 2017 12:26:58 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6B4591609D6 for ; Mon, 4 Sep 2017 14:26:57 +0200 (CEST) Received: (qmail 13621 invoked by uid 500); 4 Sep 2017 12:26:55 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 13609 invoked by uid 99); 4 Sep 2017 12:26:54 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Sep 2017 12:26:54 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 34612182A19 for ; Mon, 4 Sep 2017 12:26:54 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.679 X-Spam-Level: * X-Spam-Status: No, score=1.679 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id r2a-uoGOie-z for ; Mon, 4 Sep 2017 12:26:49 +0000 (UTC) Received: from mail-it0-f50.google.com (mail-it0-f50.google.com [209.85.214.50]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 75E545FB2E for ; Mon, 4 Sep 2017 12:26:48 +0000 (UTC) Received: by mail-it0-f50.google.com with SMTP id k189so1475846itk.0 for ; Mon, 04 Sep 2017 05:26:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=u7KwgWrwz5cL2gK2TRTZUrwMy42uyqMPvz4xvWG7Tbo=; b=AvrpqIvikuTfuLIeOe0Co7Wz/Xch9LGn7kNIPaGxQw76HeayMdawpjmM//0TTwY6Hw 0IDJFo+/xCg7bXBgk97A6HGJsqTyEQ7nMxzdJShmAt2m0F25CS3QXh97244X6loT7zJE gf3xuYqkiX+Vkf76OqxYIXnNpUEDq5yOXQdV/Pyyn9p6/R+RckhewV3w1BcSE4jCZ0uq 01XH1L02X5IjoKNAQkFQQirfU4U1GJ69qHsJz8A3ZCbA8+jvWsPxffdjoovF5EhyIlNf MwHRvdwbU/lrTaenqZG/qEiOJ7VTasCFc0CdT34V1iTqC8kzrHK0o5GX3gCZ752eIsvt 9vCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=u7KwgWrwz5cL2gK2TRTZUrwMy42uyqMPvz4xvWG7Tbo=; b=OxusKMVHsQrm/DlFrQ4XzwC/OP8boDeBbmVghPs/tn7gYkWy2JQuyyXK+3Hsk0fiq6 wbuE84CMQSCGOQelZwrULWbz//24qh8uC5Y61+wS6dqiPOtGNBj3msFiV5/tst94j0p9 uWdZBEZAjFVmWz2dQL9haPLn5yzBy7l9zMtosM4xpaNaahJrF8FQIlhIk/eOJoG/aTs3 BtSaVkShl74gSeVDolW9U+30xA1+sag5jBRheqnCF2cny/MvNlsCn5dKCB1UmtAk9Mjn M+NVAUjDkCAsoWEjpzAqYSmo2VbwH+aQTLpEjCdKTa19JtgzloVnBDt3yyUWT3SgnmoI h+nA== X-Gm-Message-State: AHPjjUjxhedgmQulmTUR1bi6aO1nwdAMGPCEry5m3LmHgBCf4hu84NBS CJJu3Wjy796EtxwcyBmuhGMLwmOBz8fm X-Google-Smtp-Source: ADKCNb6mo/mMbOQ/EZJMWmeHXj4RumHK9M4jJc9KYPlyeGxOgGKTYD7f0cv8Wx7l8+93bwQu5KMAK9icICYc/8Vq+Uw= X-Received: by 10.36.74.210 with SMTP id k201mr603898itb.24.1504528007134; Mon, 04 Sep 2017 05:26:47 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Mikhail Antonov Date: Mon, 04 Sep 2017 12:26:36 +0000 Message-ID: Subject: Re: should we split the scan range into serveral segments when the scan range only located in a single region? To: dev@hbase.apache.org Content-Type: multipart/alternative; boundary="001a1144bfba6b859305585c394e" archived-at: Mon, 04 Sep 2017 12:26:58 -0000 --001a1144bfba6b859305585c394e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I've filed https://issues.apache.org/jira/browse/HBASE-18090 some time ago and attached draft patch to it. It's not complete as we need some deeper changes in the way we open regions (see comments) but basic stuff works (I ended up going the other route and didn't have bandwidth to finish that - would be great if someone picked it up) Mikhail On Mon, Sep 4, 2017 at 11:13 AM Chia-Ping Tsai wrote: > That sounds good. There are some related issue. see > https://issues.apache.org/jira/browse/HBASE-4914 and > https://issues.apache.org/jira/browse/HBASE-4063. > > On 2017-09-04 15:06, libis wrote: > > Hi > > > > When TableInputFormat is used to source an HBase table in a MapReduce > job, > > its splitter will make a map task for each region of the table. However= , > in > > some cases, the user=E2=80=99s scan range may locate in a single region= , > resulting > > in there is a only mapper. For example, the rowkey of the table is > > =E2=80=98md5(userid) + timestamp=E2=80=99, once client want to scan the= data of a > specified > > user in the latest month with MR, it=E2=80=99s much possible that there= is only > one > > mapper working. > > > > In order to scan data in parallel if the user's scan range located in a > > single region, should we split the scan range into serveral segments > within > > a region? > > > > Best, > > > > xinxin > > > --=20 Thanks, Michael Antonov --001a1144bfba6b859305585c394e--