Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of lars.george@gmail.com
 designates 209.85.210.169 as permitted sender)
Content-Type: text/plain; charset=iso-8859-1
Mime-Version: 1.0 (Apple Message framework v1251.1)
Subject: Re: Constant error when putting large data into HBase
From: Lars George <lars.george@gmail.com>
In-Reply-To: 
 <CAMwT8DKSM89okTDLYBe6_4cqPtuNW95h1=er1CmDr2+Z=Xwwfg@mail.gmail.com>
Date: Thu, 1 Dec 2011 17:24:51 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <98CD2427-F4AC-46F2-8C90-B8E7E0B507EF@gmail.com>
References: 
 <CAMwT8DLyk0u1BeraMF1VQhCX+1JRh0QjAh2u5j1ij=1_fSfbpg@mail.gmail.com>
 <5B0FEF87-14D5-42F4-AFBC-D6213A99A299@gmail.com>
 <CAMwT8DKYXJKTrOD4obbVkLEAJHB8DLwsmu7-6P3VJ3OtGWKiDQ@mail.gmail.com>
 <79DF7533-927B-4D30-9AB3-3FE3A1B8CB25@gmail.com>
 <CAMwT8DKSM89okTDLYBe6_4cqPtuNW95h1=er1CmDr2+Z=Xwwfg@mail.gmail.com>
To: user@hbase.apache.org

HI Ed,

There is not much you can do in the HBase side, too much is simply too =
much. I have in the past lowered the number of slots per MR node, so =
that fewer threads are hitting HBase. Sorry that I misread the already =
hashed keys, yeah, then all you can try is the bulk loading, as it will =
give you much better performance in a bulk loading scenario. If you have =
to trickle data in, then this will not help. But if you have a job that =
needs to complete and part of that job is to insert something into =
HBase, you could as well output to HFiles and then bulk load them in =
(which is very fast).

Lars

On Dec 1, 2011, at 2:58 PM, edward choi wrote:

> Thanks Lars,
> I am already familiar with the sequential key problem.
> That is why I am using hash generated random string as the document =
id.
> But I guess I was still pushing the cluster too much.
>=20
> Maybe I am inserting tweet documents too fast?
> Since a single tweet is only 140 bytes, puts are performed really =
fast.
> So I am guessing maybe random keys alone are not cutting it..?
>=20
> I am counting 20,000 requests per region when I perform mapreduce =
loading.
> Is that too much to handle?
>=20
> Is there a way to deliberately slow down input process?
> I am reading from 21 node HDFS cluster and writing to 21 node HBase
> cluster, so the process speed and the sheer volume of data transaction =
is
> enormous.
> Can I set a limit to the request per region? Say, like 5000 request =
maximum?
> I really want to know just how far I can push Hbase.
> But I guess developers would say everything depends on the use case.
>=20
> I thought about using bulk loading feature but I kinda got lazy and =
just
> went with the random string rowid.
> If parameter meddling doesn't pan out, I'll have no choice but to try
> bulk-loading feature.
>=20
> Thanks for the reply.
>=20
> Regards,
> Ed
>=20
>=20
> 2011/12/1 Lars George <lars.george@gmail.com>
>=20
>> Hi Ed,
>>=20
>> Without having looked at the logs, this sounds like the common case =
of
>> overloading a single region due to your sequential row keys. Either =
hash
>> the keys, or salt them - but the best bet here is to use the bulk =
loading
>> feature of HBase (http://hbase.apache.org/bulk-loads.html). That =
bypasses
>> this problem and lets you continue to use sequential keys.
>>=20
>> Lars
>>=20
>>=20
>> On Dec 1, 2011, at 12:21 PM, edward choi wrote:
>>=20
>>> Hi Lars,
>>>=20
>>> Okay here goes some details.
>>> There are 21 tasktrackers/datanodes/regionservers
>>> There is one Jobtracker/namenode/master
>>> Three zookeepers.
>>>=20
>>> There are about 200 million tweets in Hbase.
>>> My mapreduce code is to aggregate tweets by their generated date.
>>> So in the map stage, I write out tweet date as the key, and document =
id
>> as
>>> the value (document id is randomly generated by hash algorithm)
>>> In the reduce stage, I put the data into a table. The key(which is =
the
>>> tweet date) is the table rowid, and values(which are document id's) =
as
>> the
>>> column values.
>>>=20
>>> Now, map stage is fine. I get to 100% map. But during reduce stage, =
one
>> of
>>> my regionserver fails.
>>> I don't know what the exact symptom is. I just get:
>>>> =
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>>> Failed
>>>> 1 action: servers with issues: lp171.etri.re.kr:60020,
>>>=20
>>> About "some node always die" <=3D=3D scratch this.
>>>=20
>>> To be precise,
>>> I narrowed down the range of data that I wanted to process.
>>> I tried to put tweets that was generated only at 2011/11/22.
>>> Now the reduce code will produce a row with "20111122" as the rowid, =
and
>> a
>>> bunch of document id's as the column value. (I use 32byte string as =
the
>>> document id. I append 1000 document id for a single column)
>>> So the region that my data will be inserted will have "20111122" =
between
>>> the Start Key and End Key.
>>> The regionserver that contains that specific region fails. That is =
the
>>> point. If I move that region to another regionserver using hbase =
shell,
>>> then that regionserver fails.
>>> With the same log output.
>>> After 4 failures, the job is force-cancelled and the put operation =
was
>> not
>>> done.
>>>=20
>>> Now, even with the failure, the regionserver is still online. It is =
not
>>> dead(sorry for my use of word 'die').
>>>=20
>>> I have pasted Jobtracker log, tasktracker(one that failed) log,
>>> regionserver(one that failed) log using PasteBin.
>>> The job started at 2011-12-01 17:14:43 and was killed at 2011-12-01
>>> 17:20:07.
>>>=20
>>> JobTracker Log
>>> <script src=3D"http://pastebin.com/embed_js.php?i=3Dn6sp8Fyi"></script=
>
>>>=20
>>> TaskTracker Log
>>> <script src=3D"http://pastebin.com/embed_js.php?i=3DRMFc41D5"></script=
>
>>>=20
>>> RegionServer Log
>>> <script src=3D"http://pastebin.com/embed_js.php?i=3DUpKF8HwN"></script=
>
>>>=20
>>> And finally, according to the logs I pasted, I see other lines with =
DEBUG
>>> or INFO. So I thought this was okay.
>>> Is there a way to change WARN level log to some other level log? If =
you'd
>>> let me know, I will paste another set of logs.
>>>=20
>>> Thanks,
>>> Ed
>>>=20
>>> 2011/12/1 Lars George <lars.george@gmail.com>
>>>=20
>>>> Hi Ed,
>>>>=20
>>>> You need to be more precise I am afraid. First of all what does =
"some
>> node
>>>> always dies" mean? Is the process gone? Which process is gone?
>>>> And the "error" you pasted is a WARN level log that *might* =
indicate
>> some
>>>> trouble, but is *not* the reason the "node has died". Please =
elaborate.
>>>>=20
>>>> Also consider posting the last few hundred lines of the process =
logs to
>>>> pastebin so that someone can look at it.
>>>>=20
>>>> Thanks,
>>>> Lars
>>>>=20
>>>>=20
>>>> On Dec 1, 2011, at 9:48 AM, edward choi wrote:
>>>>=20
>>>>> Hi,
>>>>> I've had a problem that has been killing for some days now.
>>>>> I am using CDH3 update2 version of Hadoop and Hbase.
>>>>> When I do a large amount of bulk loading into Hbase, some node =
always
>>>> die.
>>>>> It's not just one particular node.
>>>>> But one of many nodes fail to serve eventually.
>>>>>=20
>>>>> I set 4 gigs of heap space for master, and regionservers. I =
monitored
>> the
>>>>> process and when any node fails, it has not used all the heaps =
yet.
>>>>> So it is not a heap space problem.
>>>>>=20
>>>>> Below is what I get when I perform bulk put using MapReduce.
>>>>>=20
>>>>>=20
>>>>=20
>> =
--------------------------------------------------------------------------=
--------------------------------------------------------------------------=
--------------------------------------------------------------------------=
------------------------------------------------------------------
>>>>> 11/12/01 17:17:20 INFO mapred.JobClient:  map 100% reduce 100%
>>>>> 11/12/01 17:18:31 INFO mapred.JobClient: Task Id :
>>>>> attempt_201111302113_0034_r_000013_0, Status : FAILED
>>>>> =
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>>>> Failed
>>>>> 1 action: servers with issues: lp171.etri.re.kr:60020,
>>>>>      at
>>>>>=20
>>>>=20
>> =
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio=
n.processBatch(HConnectionManager.java:1239)
>>>>>      at
>>>>>=20
>>>>=20
>> =
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio=
n.processBatchOfPuts(HConnectionManager.java:1253)
>>>>>      at
>>>>> =
org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:828)
>>>>>      at =
org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:684)
>>>>>      at org.apache.hadoop.hbase.client.HTable.put(HTable.java:669)
>>>>>      at
>>>>>=20
>>>>=20
>> =
org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.writ=
e(TableOutputFormat.java:127)
>>>>>      at
>>>>>=20
>>>>=20
>> =
org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.writ=
e(TableOutputFormat.java:82)
>>>>>      at
>>>>>=20
>>>>=20
>> =
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTa=
sk.java:514)
>>>>>      at
>>>>>=20
>>>>=20
>> =
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputCo=
ntext.java:80)
>>>>>      at etri.qa.mapreduce.PostProcess$PostPro
>>>>> attempt_201111302113_0034_r_000013_0: 20111122
>>>>> 11/12/01 17:18:36 INFO mapred.JobClient: Task Id :
>>>>> attempt_201111302113_0034_r_000013_1, Status : FAILED
>>>>> =
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>>>> Failed
>>>>> 1 action: servers with issues: lp171.etri.re.kr:60020,
>>>>>      at
>>>>>=20
>>>>=20
>> =
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio=
n.processBatch(HConnectionManager.java:1239)
>>>>>      at
>>>>>=20
>>>>=20
>> =
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio=
n.processBatchOfPuts(HConnectionManager.java:1253)
>>>>>      at
>>>>> =
org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:828)
>>>>>      at =
org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:684)
>>>>>      at org.apache.hadoop.hbase.client.HTable.put(HTable.java:669)
>>>>>      at
>>>>>=20
>>>>=20
>> =
org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.writ=
e(TableOutputFormat.java:127)
>>>>>      at
>>>>>=20
>>>>=20
>> =
org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.writ=
e(TableOutputFormat.java:82)
>>>>>      at
>>>>>=20
>>>>=20
>> =
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTa=
sk.java:514)
>>>>>      at
>>>>>=20
>>>>=20
>> =
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputCo=
ntext.java:80)
>>>>>      at etri.qa.mapreduce.PostProcess$PostPro
>>>>> attempt_201111302113_0034_r_000013_1: 20111122
>>>>> 11/12/01 17:18:37 INFO mapred.JobClient:  map 100% reduce 95%
>>>>> 11/12/01 17:18:44 INFO mapred.JobClient:  map 100% reduce 96%
>>>>> 11/12/01 17:18:47 INFO mapred.JobClient:  map 100% reduce 98%
>>>>> 11/12/01 17:18:50 INFO mapred.JobClient:  map 100% reduce 99%
>>>>> 11/12/01 17:18:53 INFO mapred.JobClient:  map 100% reduce 100%
>>>>> 11/12/01 17:20:07 INFO mapred.JobClient: Task Id :
>>>>> attempt_201111302113_0034_r_000013_3, Status : FAILED
>>>>> =
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>>>> Failed
>>>>> 1 action: servers with issues: lp171.etri.re.kr:60020,
>>>>>      at
>>>>>=20
>>>>=20
>> =
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio=
n.processBatch(HConnectionManager.java:1239)
>>>>>      at
>>>>>=20
>>>>=20
>> =
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio=
n.processBatchOfPuts(HConnectionManager.java:1253)
>>>>>      at
>>>>> =
org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:828)
>>>>>      at =
org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:684)
>>>>>      at org.apache.hadoop.hbase.client.HTable.put(HTable.java:669)
>>>>>      at
>>>>>=20
>>>>=20
>> =
org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.writ=
e(TableOutputFormat.java:127)
>>>>>      at
>>>>>=20
>>>>=20
>> =
org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.writ=
e(TableOutputFormat.java:82)
>>>>>      at
>>>>>=20
>>>>=20
>> =
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTa=
sk.java:514)
>>>>>      at
>>>>>=20
>>>>=20
>> =
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputCo=
ntext.java:80)
>>>>>      at etri.qa.mapreduce.PostProcess$PostPro
>>>>> attempt_201111302113_0034_r_000013_3: 20111122
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:  map 100% reduce 95%
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient: Job complete:
>>>> job_201111302113_0034
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient: Counters: 23
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:   Job Counters
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:     Launched reduce =
tasks=3D34
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:     =
SLOTS_MILLIS_MAPS=3D4701750
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:     Total time spent by =
all
>>>>> reduces waiting after reserving slots (ms)=3D0
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:     Total time spent by =
all
>> maps
>>>>> waiting after reserving slots (ms)=3D0
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:     Rack-local map =
tasks=3D54
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:     Launched map =
tasks=3D310
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:     Data-local map =
tasks=3D256
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:     Failed reduce tasks=3D1=

>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:
>> SLOTS_MILLIS_REDUCES=3D3043956
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:   FileSystemCounters
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:     FILE_BYTES_READ=3D1120
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:     HDFS_BYTES_READ=3D36392=

>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:
>> FILE_BYTES_WRITTEN=3D69076396
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:     Reduce input groups=3D0=

>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:     Combine output =
records=3D0
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:     Map input =
records=3D2187386
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:     Reduce shuffle
>> bytes=3D289536
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:     Reduce output =
records=3D0
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:     Spilled =
Records=3D2187386
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:     Map output =
bytes=3D91870212
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:     Combine input =
records=3D0
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:     Map output =
records=3D2187386
>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:     SPLIT_RAW_BYTES=3D36392=

>>>>> 11/12/01 17:20:09 INFO mapred.JobClient:     Reduce input =
records=3D0
>>>>>=20
>>>>=20
>> =
--------------------------------------------------------------------------=
--------------------------------------------------------------------------=
--------------------------------------------------------------------------=
--------------------------------------------------------------------------=
----------------------------
>>>>>=20
>>>>> So I looked up the log of the node that has failed me.
>>>>> Below is the cause of failure.
>>>>>=20
>>>>=20
>> =
--------------------------------------------------------------------------=
--------------------------------------------------------------------------=
--------------------------------------------------------------------------=
--------------------------------------------------------------------------=
--------------------------------------------------------
>>>>> 2011-12-01 17:07:37,968 DEBUG
>>>>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU
>> eviction
>>>>> completed; freed=3D79.73 MB, total=3D597.89 MB, single=3D372.17 =
MB,
>>>> multi=3D298.71
>>>>> MB, memory=3D0 KB
>>>>> 2011-12-01 17:07:39,721 DEBUG
>>>>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU
>> eviction
>>>>> started; Attempting to free 79.7 MB of total=3D677.2 MB
>>>>> 2011-12-01 17:07:39,737 DEBUG
>>>>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU
>> eviction
>>>>> completed; freed=3D79.71 MB, total=3D598.06 MB, single=3D372.2 MB,
>> multi=3D298.71
>>>>> MB, memory=3D0 KB
>>>>> 2011-12-01 17:07:41,414 DEBUG
>>>>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU
>> eviction
>>>>> started; Attempting to free 79.71 MB of total=3D677.21 MB
>>>>> 2011-12-01 17:07:41,418 DEBUG
>>>>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU
>> eviction
>>>>> completed; freed=3D79.72 MB, total=3D597.82 MB, single=3D372.16 =
MB,
>>>> multi=3D298.71
>>>>> MB, memory=3D0 KB
>>>>> 2011-12-01 17:07:41,849 WARN org.apache.hadoop.ipc.HBaseServer: =
IPC
>>>> Server
>>>>> listener on 60020: readAndProcess threw exception =
java.io.IOException:
>>>>> Connection reset by peer. Count of bytes read: 0
>>>>> java.io.IOException: Connection reset by peer
>>>>>      at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>>      at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>>>>>      at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
>>>>>      at sun.nio.ch.IOUtil.read(IOUtil.java:175)
>>>>>      at =
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
>>>>>      at
>>>>>=20
>>>>=20
>> =
org.apache.hadoop.hbase.ipc.HBaseServer.channelRead(HBaseServer.java:1359)=

>>>>>      at
>>>>>=20
>>>>=20
>> =
org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseSer=
ver.java:900)
>>>>>      at
>>>>>=20
>>>>=20
>> =
org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:5=
22)
>>>>>      at
>>>>>=20
>>>>=20
>> =
org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.ja=
va:316)
>>>>>      at
>>>>>=20
>>>>=20
>> =
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.=
java:886)
>>>>>      at
>>>>>=20
>>>>=20
>> =
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java=
:908)
>>>>>      at java.lang.Thread.run(Thread.java:662)
>>>>> 2011-12-01 17:10:41,509 INFO
>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
>>>>> 8383367466793958870 lease expired
>>>>> 2011-12-01 17:11:06,816 DEBUG
>>>>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: =
total=3D601.74
>>>> MB,
>>>>> free=3D194.94 MB, max=3D796.67 MB, blocks=3D9281, =
accesses=3D3415277,
>>>> hits=3D106627,
>>>>> hitRatio=3D3.12%%, cachingAccesses=3D2734012, cachingHits=3D88768,
>>>>> cachingHitsRatio=3D3.24%%, evictions=3D2227, evicted=3D2635963,
>>>>> evictedPerRun=3D1183.6385498046875
>>>>> 2011-12-01 17:13:33,457 DEBUG
>>>>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU
>> eviction
>>>>> started; Attempting to free 79.72 MB of total=3D677.22 MB
>>>>> 2011-12-01 17:13:33,461 DEBUG
>>>>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU
>> eviction
>>>>> completed; freed=3D79.75 MB, total=3D598.11 MB, single=3D372.35 =
MB,
>>>> multi=3D298.71
>>>>> MB, memory=3D0 KB
>>>>> 2011-12-01 17:13:34,169 DEBUG
>>>>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU
>> eviction
>>>>> started; Attempting to free 79.71 MB of total=3D677.22 MB
>>>>>=20
>>>>=20
>> =
--------------------------------------------------------------------------=
--------------------------------------------------------------------------=
--------------------------------------------------------------------------=
--------------------------------------------------------------------------=
--------------------------------------------------------
>>>>>=20
>>>>> I searched google and the most prominent answer I got was the =
small
>>>> number
>>>>> of Zookeeper connections allowed. But since I am using Hbase =
0.90.4, I
>>>>> thought that was not my case.
>>>>> Could anyone please enlighten me with this problem?
>>>>>=20
>>>>> Regards,
>>>>> Ed
>>>>=20
>>>>=20
>>=20
>>=20