Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of vishal.kapoor.in@gmail.com
 designates 209.85.161.41 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=C8ATYvF8Y5XwmbdxWn+WZGRQB3U4Pmw6FRUXiuy4Is5DZH8OyEEwxlZL0Hy30q91B/
         HNBh5Bn3+hzQNpkRtpt71ccEFlaZO3gSW8/cLyuN1wFn0+VZORrsBdoizh403zXF30sb
         z4u9M9WUMPqdqhzw4SQ6aVcS+I5jUgicQ/NGI=
MIME-Version: 1.0
In-Reply-To: <AANLkTin1OJVY8nVyUVFs5X5W=UWdyfE+jysb54fhx_t6@mail.gmail.com>
References: <AANLkTi=G_3CJJE4wWtp8kLscB02TbPtRgMr591pN+ScB@mail.gmail.com>
	<AANLkTimf51sqQO9rwhiA7X58PN9GCBHaEk7BvZ-YmqRZ@mail.gmail.com>
	<AANLkTi=1CaxEFSHhdB=KC52kQBPNrK6aS2RtT2GBe_5Z@mail.gmail.com>
	<AANLkTikjfHTgO+hzZPHQBrOu6bteyGx2WJrpOPLMnG+5@mail.gmail.com>
	<AANLkTi=iJaBqsbsd6-k=1puyv4ViW4nOqe=3xgRb1LTu@mail.gmail.com>
	<AANLkTin1OJVY8nVyUVFs5X5W=UWdyfE+jysb54fhx_t6@mail.gmail.com>
Date: Wed, 16 Feb 2011 18:29:46 -0500
Message-ID: <AANLkTimb7s1zYoGnxVKyyVgW7_mm0v8DoUpW7TzX-oFd@mail.gmail.com>
Subject: Re: Hbase inserts very slow
From: Vishal Kapoor <vishal.kapoor.in@gmail.com>
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=20cf3054a47d09a645049c6ea87c

--20cf3054a47d09a645049c6ea87c
Content-Type: text/plain; charset=ISO-8859-1

J-D,
I also should mention that my data distribution in the three families are
1:1:1
I have three families so that I can have same qualifiers in them. and also
the data in those families are LIVE:MasterA:MasterB

Vishal

On Wed, Feb 16, 2011 at 6:22 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> Very often there's no need for more than 1 family, I would suggest you
> explore that possibility first.
>
> J-D
>
> On Wed, Feb 16, 2011 at 3:13 PM, Vishal Kapoor
> <vishal.kapoor.in@gmail.com> wrote:
> > does that mean I am only left with the choice of writing to the three
> > families in three different map jobs?
> > or can I do it any other way?
> > thanks,
> > Vishal
> >
> > On Wed, Feb 16, 2011 at 12:56 PM, Jean-Daniel Cryans <
> jdcryans@apache.org>
> > wrote:
> >>
> >> First, loading into 3 families is currently a bad idea and is bound to
> >> be inefficient, here's the reason why:
> >> https://issues.apache.org/jira/browse/HBASE-3149
> >>
> >> Those log lines mean that your scanning of the first table is
> >> generating a log of block cache churn. When setting up the Map, set
> >> your scanner to setCacheBlocks(false) before passing it to
> >> TableMapReduceUtil.initTableMapperJob
> >>
> >> Finally, you may want to give more memory to the region server.
> >>
> >> J-D
> >>
> >> On Wed, Feb 16, 2011 at 7:35 AM, Vishal Kapoor
> >> <vishal.kapoor.in@gmail.com> wrote:
> >> > Lars,
> >> >
> >> > I am still working on pseudo distributed.
> >> > hadoop-0.20.2+737/
> >> > and hbase-0.90.0 with the hadoop jar from the hadoop install.
> >> >
> >> > I have a LIVE_RAW_TABLE table, which gets values from a live system
> >> > I go through each row of that table and get the row ids of two
> reference
> >> > tables from it.
> >> > TABLE_A and TABLE_B, then I explode this to a new table LIVE_TABLE
> >> > I use
> >> > TableMapReduceUtil.initTableReducerJob("LIVE_TABLE", null, job);
> >> >
> >> >
> >> > LIVE_TABLE has three families, LIVE, A, B and the row id is a
> composite
> >> > key
> >> > reverseTimeStamp/rowidA/rowIdB
> >> > after that a run a bunch of map reduce to consolidate the data,
> >> > to start with I have around 15000 rows in LIVE_RAW_TABLE.
> >> >
> >> > when I start with my job, i see it running quite well till i am almost
> >> > done
> >> > with 5000 rows
> >> > then it starts printing the message in the logs, which I use to not
> see
> >> > before.
> >> > the job use to run for around 900 sec ( I have a lot of data parsing
> >> > while
> >> > exploding )
> >> > 15000 rows from LIVE_RAW_TABLE explodes to around 500,000 rows in
> >> > LIVE_TABLE.
> >> >
> >> > after those debug messages, the job runs for around 2500 sec,
> >> > I have not changed anything, including the table design.
> >> >
> >> > here is my table description.
> >> >
> >> > {NAME => 'LIVE_TABLE', FAMILIES => [{NAME => 'LIVE', BLOOMFILTER =>
> >> > 'NONE',
> >> > REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', TTL
> =>
> >> > '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE
> =>
> >> > 'true'}, {NAME => 'A', BLOOMFILTER => 'NONE', REPLICATION_SCOPE =>
> '0',
> >> > VERSIONS => '1', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE
> >> > =>
> >> > '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'B',
> >> > BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '1',
> >> > COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536',
> >> > IN_MEMORY
> >> > => 'false', BLOCKCACHE => 'true'}]}
> >> >
> >> > thanks for all your help.
> >> >
> >> > Vishal
> >> >
> >> > On Wed, Feb 16, 2011 at 4:26 AM, Lars George <lars.george@gmail.com>
> >> > wrote:
> >> >
> >> >> Hi Vishal,
> >> >>
> >> >> These are DEBUG level messages and are from the block cache, there is
> >> >> nothing wrong with that. Can you explain more what you do and see?
> >> >>
> >> >> Lars
> >> >>
> >> >> On Wed, Feb 16, 2011 at 4:24 AM, Vishal Kapoor
> >> >> <vishal.kapoor.in@gmail.com> wrote:
> >> >> > all was working fine and suddenly I see a lot of logs like below
> >> >> >
> >> >> > 2011-02-15 22:19:04,023 DEBUG
> >> >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU
> >> >> > eviction
> >> >> > started; Attempting to free 19.88 MB of total=168.64 MB
> >> >> > 2011-02-15 22:19:04,025 DEBUG
> >> >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU
> >> >> > eviction
> >> >> > completed; freed=19.91 MB, total=148.73 MB, single=74.47 MB,
> >> >> > multi=92.37
> >> >> MB,
> >> >> > memory=166.09 KB
> >> >> > 2011-02-15 22:19:11,207 DEBUG
> >> >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU
> >> >> > eviction
> >> >> > started; Attempting to free 19.88 MB of total=168.64 MB
> >> >> > 2011-02-15 22:19:11,444 DEBUG
> >> >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU
> >> >> > eviction
> >> >> > completed; freed=19.93 MB, total=149.09 MB, single=73.91 MB,
> >> >> > multi=93.32
> >> >> MB,
> >> >> > memory=166.09 KB
> >> >> > 2011-02-15 22:19:21,494 DEBUG
> >> >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU
> >> >> > eviction
> >> >> > started; Attempting to free 19.87 MB of total=168.62 MB
> >> >> > 2011-02-15 22:19:21,760 DEBUG
> >> >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU
> >> >> > eviction
> >> >> > completed; freed=19.91 MB, total=148.84 MB, single=74.22 MB,
> >> >> > multi=92.73
> >> >> MB,
> >> >> > memory=166.09 KB
> >> >> > 2011-02-15 22:19:39,838 DEBUG
> >> >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU
> >> >> > eviction
> >> >> > started; Attempting to free 19.87 MB of total=168.62 MB
> >> >> > 2011-02-15 22:19:39,852 DEBUG
> >> >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU
> >> >> > eviction
> >> >> > completed; freed=19.91 MB, total=148.71 MB, single=75.35 MB,
> >> >> > multi=91.48
> >> >> MB,
> >> >> > memory=166.09 KB
> >> >> > 2011-02-15 22:19:49,768 DEBUG
> >> >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU
> >> >> > eviction
> >> >> > started; Attempting to free 19.87 MB of total=168.62 MB
> >> >> > 2011-02-15 22:19:49,770 DEBUG
> >> >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU
> >> >> > eviction
> >> >> > completed; freed=19.91 MB, total=148.71 MB, single=76.48 MB,
> >> >> > multi=90.35
> >> >> MB,
> >> >> > memory=166.09 KB
> >> >> >
> >> >> >
> >> >> > I haven't changed anything including the table definitions.
> >> >> > please let me know where to look...
> >> >> >
> >> >> > thanks,
> >> >> > Vishal Kapoor
> >> >> >
> >> >>
> >> >
> >
> >
>

--20cf3054a47d09a645049c6ea87c--