Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 73401D1BC for ; Fri, 23 Nov 2012 17:23:21 +0000 (UTC) Received: (qmail 55201 invoked by uid 500); 23 Nov 2012 17:23:19 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 54665 invoked by uid 500); 23 Nov 2012 17:23:16 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 54617 invoked by uid 99); 23 Nov 2012 17:23:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Nov 2012 17:23:15 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of amits@infolinks.com designates 207.126.144.127 as permitted sender) Received: from [207.126.144.127] (HELO eu1sys200aog109.obsmtp.com) (207.126.144.127) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 23 Nov 2012 17:23:08 +0000 Received: from mail-we0-f197.google.com ([74.125.82.197]) (using TLSv1) by eu1sys200aob109.postini.com ([207.126.147.11]) with SMTP ID DSNKUK+w6LWMuHntaMtAjwEgkxbuwjT4207q@postini.com; Fri, 23 Nov 2012 17:22:48 UTC Received: by mail-we0-f197.google.com with SMTP id t11so3338693wey.8 for ; Fri, 23 Nov 2012 09:22:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=HKhT2FQ95F/6BwnYAo4AFzX93Nkw++pGgINApDo+NT8=; b=mJyZLm1eM5jAhlHV8yBD635Ypp/31GtOz5PoOzBH5cOor+fr88zYa3rOcu94wiahNz HWck4kyI95lH07pK1JNZJ+aYuRx9skoB2gTMaAgC7pd6I8ONMwpsq/Mr+l7MOOnPflGB 5zmnnINzqiHExpkfurrsRbRN915kWxOG5nLZz+Ta1jnVQdgYRc/bhw2FELVdKw1Zx7EI QYwouq1A5gvelWmQDd6aOt4Ar121qE2Or/516hFhJQPHffsF2SxGsZRH2zftE1Csp24Q dmaObg79ZqjWYb1WhXmgPW6qD0ISIi9R+wDVCwQbcLcLwFgBAZFX+pjoeP9GG4dG+35R BFBw== Received: by 10.152.106.237 with SMTP id gx13mr3947678lab.46.1353691367813; Fri, 23 Nov 2012 09:22:47 -0800 (PST) MIME-Version: 1.0 Received: by 10.152.106.237 with SMTP id gx13mr3947667lab.46.1353691367601; Fri, 23 Nov 2012 09:22:47 -0800 (PST) Received: by 10.114.38.204 with HTTP; Fri, 23 Nov 2012 09:22:47 -0800 (PST) Received: by 10.114.38.204 with HTTP; Fri, 23 Nov 2012 09:22:47 -0800 (PST) In-Reply-To: <96C12A1D-B560-4084-8A10-80D8D67057CA@gmail.com> References: <50A10B4C.5060709@uci.cu> <12FBA326CCB5D446B61A2DDDCB41E420301FA82C@szxeml545-mbx.china.huawei.com> <96C12A1D-B560-4084-8A10-80D8D67057CA@gmail.com> Date: Fri, 23 Nov 2012 19:22:47 +0200 Message-ID: Subject: Re: scan is slower after bulk load From: Amit Sela To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=f46d0407161319f64f04cf2cd5f7 X-Gm-Message-State: ALoCoQlqYVkwe83I5DMKz/vva5F1kucq4thglGmyQwCh8zDnippomtmUuuClcLL5s7EyScaOEanw+jMqo6ZzTuwQQQ6+mdrA0b6glXiAoHMZ316ygIsJZRMu4faFY72kJv/M4//R2hgn9afZ+sqxykhYmWvLMYJLuQ== X-Virus-Checked: Checked by ClamAV on apache.org --f46d0407161319f64f04cf2cd5f7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I gave it a few more shots and it was back to normal... Bulk loading is faster but more important (for us) it's more stable and doesn't cause full GC in the region server even if loading it more then usual. The map time remains the same. For reduce we chose to write out a sequence file so it's quite fast, and the bulk load map is extremely fast. The bulk load reduce is also fast but it depends on the number of regions in the table. We used our own code so that only specific regions will be targeted (I think I posted it). Bottom line - about 30% faster. But I expect it to handle bigger loads better. On Nov 22, 2012 11:51 PM, "Asaf Mesika" wrote: > Did you end up finding the answer? > How fast is this method of insertion relative to a simple insert of > List ? > > > On 13 =D7=91=D7=A0=D7=95=D7=91 2012, at 02:29, Bijieshan wrote: > > > I think one possible reason is block caching. Have you turned the block > caching off during scanning? > > > > Regards, > > Jieshan > > ________________________________________ > > From: Mohammad Tariq [dontariq@gmail.com] > > Sent: Tuesday, November 13, 2012 1:04 > > To: user@hbase.apache.org > > Subject: Re: scan is slower after bulk load > > > > may be because bulk load writes to the same region thus putting the > entire > > load on a single region server. > > > > Regards, > > Mohammad Tariq > > > > > > > > On Mon, Nov 12, 2012 at 9:15 PM, Michael Segel < > michael_segel@hotmail.com>wrote: > > > >> Just a guess... have you done any compactions on the table post bulk > load? > >> > >> On Nov 12, 2012, at 8:44 AM, Marcos Ortiz wrote: > >> > >>> Regards, Amit. > >>> Did you tuned the RegionServer where you has that data range hosted? > >>> Why do you say that scans are slower after a bulk load? > >>> Did you test it before bulk load? > >>> > >>> HBase version? > >>> > >>> On 11/12/2012 09:39 AM, Amit Sela wrote: > >>>> Hi all, > >>>> > >>>> Anyone has any idea why scanning over specific range in a table is > about > >>>> 20% slower if that data (that specific range) was just inserted into > >> HBase > >>>> using bulk load ? > >>>> > >>>> I do the bulk load programmatically with LoadIncrementalHFiles. > >>>> > >>>> Thanks. > >>>> > >>> > >>> -- > >>> > >>> Marcos Luis Ort=C3=ADz Valmaseda > >>> about.me/marcosortiz > >>> @marcosluis2186 > >>> > >>> > >>> > >>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS > >> INFORMATICAS... > >>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > >>> > >>> http://www.uci.cu > >>> http://www.facebook.com/universidad.uci > >>> http://www.flickr.com/photos/universidad_uci > >> > >> > > --f46d0407161319f64f04cf2cd5f7--