hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: distributed log splitting aborted
Date Mon, 09 Jul 2012 20:55:35 GMT
We've been running with distributed splitting here for >6 months and
never had this issue. Also the exceptions you are seeing come from
HDFS and not HBase, the fact that it worked from the master and not
the region servers seem to point to a network configuration issue
because the actual splitting code is really the same.

J-D

On Sun, Jul 8, 2012 at 2:25 PM, Cyril Scetbon <cyril.scetbon@free.fr> wrote:
> I've finally succeeded in starting my cluster by disabling hbase.master.distributed.log.splitting
>
> it took less than 10 minutes to start it compared to the whole night without any success
with distributed log splitting enabled. Don't you think like me that it's just buggy ??
>
> thanks
>
> Cyril SCETBON
>
> On Jul 6, 2012, at 8:40 PM, Cyril Scetbon wrote:
>
>> As you can see in the master log, region servers are in charge of splitting log files
(not found I suppose) and it's retried several times (I didn't check if it's always redone)
 on different region servers. You can for example follow a failing split concerning a file
not found in the hadoop filesystem :
>>
>> http://pastebin.com/RbcLdbcs
>>
>> Regards
>>
>> Cyril SCETBON
>>
>> On Jul 6, 2012, at 8:17 PM, Cyril Scetbon wrote:
>>
>>> Here are the log files you asked for :
>>>
>>> http://pastebin.com/xRBuQdNS  <---- hbase-master.log
>>>
>>> http://pastebin.com/u6WYQT6R <---- hdfs-namenode.log
>>>
>>> If you find the fix to this damn issue I'll enjoy !
>>>
>>> Thanks
>>>
>>> Cyril SCETBON
>>>
>>> On Jul 5, 2012, at 11:44 PM, Jean-Daniel Cryans wrote:
>>>
>>>> Interesting... Can you read the file? Try a "hadoop dfs -cat" on it
>>>> and see if it goes to the end of it.
>>>>
>>>> It could also be useful to see a bigger portion of the master log, for
>>>> all I know maybe it handles it somehow and there's a problem
>>>> elsewhere.
>>>>
>>>> Finally, which Hadoop version are you using?
>>>>
>>>> Thx,
>>>>
>>>> J-D
>>>>
>>>> On Thu, Jul 5, 2012 at 1:58 PM, Cyril Scetbon <cyril.scetbon@free.fr>
wrote:
>>>>> yes :
>>>>>
>>>>> /hbase/.logs/hb-d12,60020,1341429679981-splitting/hb-d12%2C60020%2C1341429679981.134143064971
>>>>>
>>>>> I did a fsck and here is the report :
>>>>>
>>>>> Status: HEALTHY
>>>>> Total size:    618827621255 B (Total open files size: 868 B)
>>>>> Total dirs:    4801
>>>>> Total files:   2825 (Files currently being written: 42)
>>>>> Total blocks (validated):      11479 (avg. block size 53909541 B) (Total
open file blocks (not validated): 41)
>>>>> Minimally replicated blocks:   11479 (100.0 %)
>>>>> Over-replicated blocks:        1 (0.008711561 %)
>>>>> Under-replicated blocks:       0 (0.0 %)
>>>>> Mis-replicated blocks:         0 (0.0 %)
>>>>> Default replication factor:    4
>>>>> Average block replication:     4.0000873
>>>>> Corrupt blocks:                0
>>>>> Missing replicas:              0 (0.0 %)
>>>>> Number of data-nodes:          12
>>>>> Number of racks:               1
>>>>> FSCK ended at Thu Jul 05 20:56:35 UTC 2012 in 795 milliseconds
>>>>>
>>>>>
>>>>> The filesystem under path '/hbase' is HEALTHY
>>>>>
>>>>> Cyril SCETBON
>>>>>
>>>>> Cyril SCETBON
>>>>>
>>>>> On Jul 5, 2012, at 7:59 PM, Jean-Daniel Cryans wrote:
>>>>>
>>>>>> Does this file really exist in HDFS?
>>>>>>
>>>>>> hdfs://hb-zk1:54310/hbase/.logs/hb-d12,60020,1341429679981-splitting/hb-d12%2C60020%2C1341429679981.1341430649711
>>>>>>
>>>>>> If so, did you run fsck in HDFS?
>>>>>>
>>>>>> It would be weird if HDFS doesn't report anything bad but somehow
the
>>>>>> clients (like HBase) can't read it.
>>>>>>
>>>>>> J-D
>>>>>>
>>>>>> On Thu, Jul 5, 2012 at 12:45 AM, Cyril Scetbon <cyril.scetbon@free.fr>
wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I can nolonger start my cluster correctly and get messages like
http://pastebin.com/T56wrJxE (taken on one region server)
>>>>>>>
>>>>>>> I suppose Hbase is not done for being stopped but only for having
some nodes going down ??? HDFS is not complaining, it's only HBase that can't start correctly
:(
>>>>>>>
>>>>>>> I suppose some data has not been flushed and it's not really
important for me. Is there a way to fix theses errors even if I will lose data ?
>>>>>>>
>>>>>>> thanks
>>>>>>>
>>>>>>> Cyril SCETBON
>>>>>>>
>>>>>
>>>
>>
>

Mime
View raw message