hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Levin <magn...@gmail.com>
Subject Re: Regions loading too fast
Date Fri, 24 Sep 2010 23:59:14 GMT
http://pastebin.com/bD3JJ0sD

The logs were 17MB in size max, and variable sizes like that.

-Jack

On Fri, Sep 24, 2010 at 4:56 PM, Stack <stack@duboce.net> wrote:
> Please paste the section from regionserver where you were getting the
> EOF to pastebin.  I'd like to see exactly where (but yeah, you get the
> idea moving the files aside).  Check the files too.  Are they
> zero-length?  If so, please look for them in the master log and paste
> me the section where we are splitting.
>
> Thanks Jack,
> St.Ack
>
>
> On Fri, Sep 24, 2010 at 4:52 PM, Jack Levin <magnito@gmail.com> wrote:
>> It was EOF exception, but now that I deleted edits files:
>>
>> Moved to trash:
>> hdfs://namenode-rd.imageshack.us:9000/hbase/img96/1062260343/recovered.edits/0000000000617305532
>> Moved to trash:
>> hdfs://namenode-rd.imageshack.us:9000/hbase/img96/1321772129/recovered.edits/0000000000617328530
>> Moved to trash:
>> hdfs://namenode-rd.imageshack.us:9000/hbase/img96/257974055/recovered.edits/0000000000617238642
>> Moved to trash:
>> hdfs://namenode-rd.imageshack.us:9000/hbase/img97/117679080/recovered.edits/0000000000617306059
>> Moved to trash:
>> hdfs://namenode-rd.imageshack.us:9000/hbase/img97/221569766/recovered.edits/0000000000617242019
>>
>> Like these.  All of the regions have loaded... What could that have
>> been?   I assume I lost some writes, but this is not a big deal to
>> me... question is how to avoid something like that, is that a bug?
>>
>> -Jack
>>
>>
>> On Fri, Sep 24, 2010 at 4:44 PM, Stack <stack@duboce.net> wrote:
>>> What is the complaint in regionserver log when region load fails?
>>> St.Ack
>>>
>>> On Fri, Sep 24, 2010 at 4:40 PM, Jack Levin <magnito@gmail.com> wrote:
>>>> so, datanode log shows no errors whatsoever, however I do see same
>>>> blocks fetched repeatedly, and the network speed is quite high, but I
>>>> am unable to load _some_ regions, what could it be?
>>>>
>>>> 2010-09-24 16:38:42,729 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>>> /10.101.6.2:50010, dest: /10.101.6.2:53038, bytes: 914, op: HDFS_READ,
>>>> cliID: DFSClient_hb_rs_rdaf2.prod.imageshack.com,60020,1285371202189_1285371202237,
>>>> offset: 13803520, srvID: DS-1363732508-10.101.6.2-50010-1284520709569,
>>>> blockid: blk_5556468858269577961_1550101, duration: 127413
>>>> 2010-09-24 16:38:44,317 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>>> /10.101.6.2:50010, dest: /10.101.6.2:53048, bytes: 110, op: HDFS_READ,
>>>> cliID: DFSClient_hb_rs_rdaf2.prod.imageshack.com,60020,1285371202189_1285371202237,
>>>> offset: 32723968, srvID: DS-1363732508-10.101.6.2-50010-1284520709569,
>>>> blockid: blk_364673737339632029_1347910, duration: 1140653
>>>> 2010-09-24 16:38:44,318 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>>> /10.101.6.2:50010, dest: /10.101.6.2:53049, bytes: 38294, op:
>>>> HDFS_READ, cliID:
>>>> DFSClient_hb_rs_rdaf2.prod.imageshack.com,60020,1285371202189_1285371202237,
>>>> offset: 32686080, srvID: DS-1363732508-10.101.6.2-50010-1284520709569,
>>>> blockid: blk_364673737339632029_1347910, duration: 691929
>>>> 2010-09-24 16:38:44,510 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>>> /10.101.6.2:50010, dest: /10.101.6.2:53054, bytes: 18021300, op:
>>>> HDFS_READ, cliID:
>>>> DFSClient_hb_rs_rdaf2.prod.imageshack.com,60020,1285371202189_1285371202237,
>>>> offset: 0, srvID: DS-1363732508-10.101.6.2-50010-1284520709569,
>>>> blockid: blk_-3781179144642915580_1571141, duration: 173548261
>>>> 2010-09-24 16:38:44,525 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>>> /10.101.6.2:50010, dest: /10.101.6.2:53055, bytes: 506, op: HDFS_READ,
>>>> cliID: DFSClient_hb_rs_rdaf2.prod.imageshack.com,60020,1285371202189_1285371202237,
>>>> offset: 48700928, srvID: DS-1363732508-10.101.6.2-50010-1284520709569,
>>>> blockid: blk_-176750251227749356_1535293, duration: 77045
>>>> 2010-09-24 16:38:44,526 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
>>>> /10.101.6.2:50010, dest: /10.101.6.2:53056, bytes: 6182, op:
>>>> HDFS_READ, cliID:
>>>> DFSClient_hb_rs_rdaf2.prod.imageshack.com,60020,1285371202189_1285371202237,
>>>> offset: 48695296, srvID: DS-1363732508-10.101.6.2-50010-1284520709569,
>>>> blockid: blk_-176750251227749356_1535293, duration: 128270
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Sep 24, 2010 at 4:32 PM, Stack <stack@duboce.net> wrote:
>>>>> (Good one Ryan)
>>>>>
>>>>> Master is doing the assigning.  It needs to be restarted to see the
>>>>> config change.
>>>>>
>>>>> St.Ack
>>>>>
>>>>> On Fri, Sep 24, 2010 at 4:28 PM, Jack Levin <magnito@gmail.com>
wrote:
>>>>>> Only regionserver, do I need to restart both?
>>>>>>
>>>>>> -jack
>>>>>>
>>>>>> On Fri, Sep 24, 2010 at 4:22 PM, Ryan Rawson <ryanobjc@gmail.com>
wrote:
>>>>>>> Did you restart the master and the regionserver? Or just one
or the other?
>>>>>>>
>>>>>>> -ryan
>>>>>>>
>>>>>>> On Fri, Sep 24, 2010 at 4:21 PM, Jack Levin <magnito@gmail.com>
wrote:
>>>>>>>> Also, even with '1' value, I see:
>>>>>>>>
>>>>>>>> 2010-09-24 16:20:29,983 INFO
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>>>>>>> img834,1000351n.jpg,1285251664421.d09510a16c0cfd0d8a251a36229125e0.
>>>>>>>> 2010-09-24 16:20:29,984 INFO
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>>>>>>> img651,pict1408.jpg,1285018965749.110871465
>>>>>>>> 2010-09-24 16:20:29,984 INFO
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>>>>>>> img806,sam0084a.jpg,1285324613056.82a1e8ba8d2a37a591a847fb36803c45.
>>>>>>>> 2010-09-24 16:20:29,985 INFO
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>>>>>>> img535,screenshot1bt.png,1285323376435.fae5f3ab474196c99f10b8a936fb9ead.
>>>>>>>> 2010-09-24 16:20:29,985 INFO
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>>>>>>> img838,123468.jpg,1285223690165.a2903008621d1a6b6ca02441bf3b68ea.
>>>>>>>> 2010-09-24 16:20:29,985 INFO
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>>>>>>> img839,yug.jpg,1285230318537.c09323dbaf54130671df2a14d671fe25.
>>>>>>>> 2010-09-24 16:20:29,985 INFO
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>>>>>>> img821,vlcsnap78737.png,1285283076812.ea4973ce6e43d7f974613c5989647278.
>>>>>>>> 2010-09-24 16:20:29,985 INFO
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>>>>>>> img805,njt30scbkdmb.gif,1285322429401.f9aacdafd8064bfbcc8cd4f6930febbe.
>>>>>>>> 2010-09-24 16:20:29,985 INFO
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>>>>>>> img94,img1711m.jpg,1285016850260.1424182007
>>>>>>>> 2010-09-24 16:20:29,986 DEBUG
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion: Creating region
>>>>>>>> img840,kitbarca2.png,1285189312696.1ce170ec09384fca51297a5fe7aeb4af.
>>>>>>>>
>>>>>>>> Which is pretty close to concurrent.
>>>>>>>>
>>>>>>>> -Jack
>>>>>>>>
>>>>>>>> On Fri, Sep 24, 2010 at 4:16 PM, Jack Levin <magnito@gmail.com>
wrote:
>>>>>>>>> Still having a problem:
>>>>>>>>>
>>>>>>>>> 2010-09-24 16:15:02,572 ERROR
>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Error
opening
>>>>>>>>> img695,p1908101232.jpg,1285288492084.d451f05024b42f71a115951c62cdcccf.
>>>>>>>>> java.io.EOFException
>>>>>>>>>        at java.io.DataInputStream.readFully(DataInputStream.java:180)
>>>>>>>>>        at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
>>>>>>>>>        at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
>>>>>>>>>        at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1937)
>>>>>>>>>        at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1837)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I changed the value to '1', and restarted the regionserver...
Note
>>>>>>>>> that my hdfs is not having a problem.
>>>>>>>>>
>>>>>>>>> -Jack
>>>>>>>>>
>>>>>>>>> On Fri, Sep 24, 2010 at 4:01 PM, Stack <stack@duboce.net>
wrote:
>>>>>>>>>> Try
>>>>>>>>>>
>>>>>>>>>>  <property>
>>>>>>>>>>    <name>hbase.regions.percheckin</name>
>>>>>>>>>>    <value>10</value>
>>>>>>>>>>    <description>Maximum number of regions
that can be assigned in a single go
>>>>>>>>>>    to a region server.
>>>>>>>>>>    </description>
>>>>>>>>>>  </property>
>>>>>>>>>>
>>>>>>>>>> What do you have now?  Whatever it is, go down from
there.
>>>>>>>>>>
>>>>>>>>>> St.Ack
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Sep 24, 2010 at 3:07 PM, Jack Levin <magnito@gmail.com>
wrote:
>>>>>>>>>>> My regions are 1gb in size and when I cold start
the cluster I oversaturate my network links (1000 mbps) and get client dfs timeouts , anyway
to slow the m down?
>>>>>>>>>>>
>>>>>>>>>>> -Jack
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message