hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars George <lars.geo...@gmail.com>
Subject Re: Borked Splitlog
Date Mon, 16 May 2011 21:07:02 GMT
Created HBASE-3889 for this.

On Mon, May 16, 2011 at 8:42 PM, Stack <stack@duboce.net> wrote:
> On Mon, May 16, 2011 at 2:07 AM, Lars George <lars.george@gmail.com> wrote:
>> I am still stuck with this cluster not starting again, I know it is
>> all local and such, therefore not really representative, but this
>> ought to work, no? See this log I get at startup:
>>
>
> Do you have replication on?  Is this TRUNK of 0.90 branch?  If TRUNK
> then we are doing distributed splitting?
>
> Sounds like bug in here Lars, especially if it makes for this much confusion.
>
> St.Ack
>
>> 2011-05-16 11:00:36,834 INFO
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker
>> 10.0.0.64,60020,1305536432387 starting
>> 2011-05-16 11:00:36,838 INFO
>> org.apache.hadoop.hbase.regionserver.StoreFile: Allocating
>> LruBlockCache with maximum size 197.5m
>> 2011-05-16 11:00:36,850 INFO
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker: successfully
>> transitioned task /hbase/splitlog/RESCAN0000234067 to final state done
>> 2011-05-16 11:00:36,852 DEBUG
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker: tasks arrived or
>> departed
>> 2011-05-16 11:00:36,854 INFO
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker
>> 10.0.0.64,60020,1305536432387 acquired task
>> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:00:36,871 DEBUG
>> org.apache.hadoop.hbase.monitoring.MonitoredTask: setDescritption:
>> Splitting log file
>> hdfs://localhost/hbase/.logs/10.0.0.65,60020,1305406356765/10.0.0.65%2C60020%2C1305406356765.1305409968389into
>> a temporary staging area.
>> 2011-05-16 11:00:36,874 INFO
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog:
>> hdfs://localhost/hbase/.logs/10.0.0.65,60020,1305406356765/10.0.0.65%2C60020%2C1305406356765.1305409968389,
>> length=16173236224
>> 2011-05-16 11:00:36,874 DEBUG
>> org.apache.hadoop.hbase.monitoring.MonitoredTask: setStatus: Opening
>> log file
>> 2011-05-16 11:00:36,875 INFO org.apache.hadoop.hbase.util.FSUtils:
>> Recovering file
>> hdfs://localhost/hbase/.logs/10.0.0.65,60020,1305406356765/10.0.0.65%2C60020%2C1305406356765.1305409968389
>> 2011-05-16 11:00:37,415 WARN
>> org.apache.hadoop.hbase.regionserver.wal.HLog: HDFS pipeline error
>> detected. Found 1 replicas but expecting 3 replicas.  Requesting close
>> of hlog.
>> 2011-05-16 11:00:37,876 INFO org.apache.hadoop.hbase.util.FSUtils:
>> Finished lease recover attempt for
>> hdfs://localhost/hbase/.logs/10.0.0.65,60020,1305406356765/10.0.0.65%2C60020%2C1305406356765.1305409968389
>> 2011-05-16 11:00:38,073 INFO
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: This region's
>> directory doesn't exist:
>> hdfs://localhost:8020/hbase/usertable/30c4d0a47703214845d0676d0c7b36f0.
>> It is very likely that it was already split so it's safe to discard
>> those edits.
>> 2011-05-16 11:00:38,074 INFO
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: processed 0
>> edits across 0 regions threw away edits for 1 regions log file =
>> hdfs://localhost/hbase/.logs/10.0.0.65,60020,1305406356765/10.0.0.65%2C60020%2C1305406356765.1305409968389
>> is corrupted = false
>> 2011-05-16 11:00:38,074 DEBUG
>> org.apache.hadoop.hbase.monitoring.MonitoredTask: setStatus: processed
>> 0 edits across 0 regions threw away edits for 1 regions log file =
>> hdfs://localhost/hbase/.logs/10.0.0.65,60020,1305406356765/10.0.0.65%2C60020%2C1305406356765.1305409968389
>> is corrupted = false
>> 2011-05-16 11:00:38,074 DEBUG
>> org.apache.hadoop.hbase.monitoring.MonitoredTask: markComplete:
>> processed 0 edits across 0 regions threw away edits for 1 regions log
>> file = hdfs://localhost/hbase/.logs/10.0.0.65,60020,1305406356765/10.0.0.65%2C60020%2C1305406356765.1305409968389
>> is corrupted = false
>> 2011-05-16 11:00:38,074 INFO
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker
>> 10.0.0.64,60020,1305536432387 done with task
>> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> in 1217ms
>> 2011-05-16 11:00:38,825 INFO
>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager:
>> Moving 10.0.0.64,60020,1305535848569's hlogs to my queue
>>
>> ==> /var/lib/hbase/logs/hbase-larsgeorge-5-master-de1-app-mbp-2.log <==
>> 2011-05-16 11:00:41,691 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:00:42,691 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:00:43,691 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:00:44,691 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:00:45,691 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:00:46,691 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:00:47,691 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:00:48,691 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:00:49,691 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:00:50,691 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:00:51,691 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:00:52,691 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:00:53,691 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:00:54,691 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:00:55,691 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:00:56,691 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:00:57,691 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:00:58,691 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:00:59,692 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:01:00,692 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:01:01,692 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:01:02,692 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:01:03,692 INFO
>> org.apache.hadoop.hbase.master.SplitLogManager: resubmitting task
>> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:03,693 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 0
>> 2011-05-16 11:01:03,693 INFO
>> org.apache.hadoop.hbase.master.SplitLogManager: resubmitted 1 out of 1
>> tasks
>> 2011-05-16 11:01:03,694 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired
>> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> ver = 28
>> 2011-05-16 11:01:03,694 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired
>> /hbase/splitlog/RESCAN0000234069 ver = 0
>> 2011-05-16 11:01:04,693 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:04,693 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:05,693 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:05,693 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:06,693 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:06,693 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:07,693 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:07,693 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:08,693 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:08,693 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:09,694 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:09,694 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:10,694 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:10,694 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:11,694 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:11,694 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:12,694 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:12,694 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:13,694 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:13,694 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:14,695 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:14,695 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:15,695 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:15,695 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:16,695 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:16,695 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:17,695 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:17,695 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:18,696 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:18,696 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:19,696 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:19,696 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:20,696 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:20,696 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:21,696 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:21,696 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:22,696 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:22,696 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:23,696 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:23,696 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:24,697 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:24,697 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:25,696 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:25,696 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:26,696 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:26,696 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:27,696 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:27,696 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:28,696 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:28,696 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:29,696 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>> 2011-05-16 11:01:29,697 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>> unassigned = 1
>> 2011-05-16 11:01:30,696 DEBUG
>> org.apache.hadoop.hbase.master.SplitLogManager: chore: unassigned task
>> path -> /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
>>
>> I hacked the code to have the SplitLogManager delete all orphaned
>> RESCAN znodes, as I ended up having hundreds of them, and there seems
>> to be no way to "delete *" them, right? Is there a trick to be able to
>> delete a non-empty node in zkCli?
>>
>> Anyhow, the split is supposedly done, or the task at least reports as
>> complete, then the replication ReplicationSourceManager kicks in, and
>> then the task gets relisted over and over again. Just after a few
>> minutes you see this in ZK's /hbase/splitlogs:
>>
>> [RESCAN0000234200, RESCAN0000234209, RESCAN0000234207,
>> RESCAN0000234208, RESCAN0000234205, RESCAN0000234206,
>> RESCAN0000234203, RESCAN0000234204, RESCAN0000234201,
>> RESCAN0000234202, RESCAN0000234237, RESCAN0000234236,
>> RESCAN0000234235, RESCAN0000234234, RESCAN0000234239,
>> RESCAN0000234238, RESCAN0000234232, RESCAN0000234233,
>> RESCAN0000234230, RESCAN0000234231, RESCAN0000234219,
>> RESCAN0000234218, RESCAN0000234217, RESCAN0000234216,
>> RESCAN0000234215, RESCAN0000234214, RESCAN0000234213,
>> RESCAN0000234212, RESCAN0000234210, RESCAN0000234211,
>> RESCAN0000234228, RESCAN0000234227, RESCAN0000234229,
>> RESCAN0000234224, RESCAN0000234223, RESCAN0000234226,
>> RESCAN0000234225, RESCAN0000234220, RESCAN0000234221,
>> RESCAN0000234222, RESCAN0000234100, RESCAN0000234101,
>> RESCAN0000234107, RESCAN0000234106, RESCAN0000234109,
>> RESCAN0000234108, RESCAN0000234103, RESCAN0000234102,
>> RESCAN0000234105, RESCAN0000234104, RESCAN0000234111,
>> RESCAN0000234112, RESCAN0000234110, RESCAN0000234116,
>> RESCAN0000234115, RESCAN0000234114, RESCAN0000234113,
>> RESCAN0000234119, RESCAN0000234118, RESCAN0000234117,
>> RESCAN0000234120, RESCAN0000234121, RESCAN0000234122,
>> RESCAN0000234123, RESCAN0000234125, RESCAN0000234124,
>> RESCAN0000234127, RESCAN0000234126, RESCAN0000234129,
>> RESCAN0000234128, RESCAN0000234134, RESCAN0000234133,
>> RESCAN0000234132, RESCAN0000234131, RESCAN0000234130,
>> RESCAN0000234139, RESCAN0000234137, RESCAN0000234138,
>> RESCAN0000234135, RESCAN0000234136, RESCAN0000234143,
>> RESCAN0000234142, RESCAN0000234145, RESCAN0000234144,
>> RESCAN0000234141, RESCAN0000234140, RESCAN0000234146,
>> RESCAN0000234147, RESCAN0000234148, RESCAN0000234149,
>> RESCAN0000234152, RESCAN0000234151, RESCAN0000234150,
>> RESCAN0000234156, RESCAN0000234155, RESCAN0000234154,
>> RESCAN0000234153, RESCAN0000234159, RESCAN0000234157,
>> RESCAN0000234158, RESCAN0000234161, RESCAN0000234160,
>> RESCAN0000234163, RESCAN0000234162, RESCAN0000234165,
>> RESCAN0000234164, RESCAN0000234167, RESCAN0000234166,
>> RESCAN0000234168, RESCAN0000234169, RESCAN0000234179,
>> RESCAN0000234175, RESCAN0000234176, RESCAN0000234177,
>> RESCAN0000234178, RESCAN0000234171, RESCAN0000234172,
>> RESCAN0000234173, RESCAN0000234174, RESCAN0000234170,
>> RESCAN0000234188, RESCAN0000234189, RESCAN0000234186,
>> RESCAN0000234187, RESCAN0000234184, RESCAN0000234185,
>> RESCAN0000234182, RESCAN0000234183, RESCAN0000234180,
>> RESCAN0000234181, RESCAN0000234193, RESCAN0000234194,
>> RESCAN0000234195, RESCAN0000234196, RESCAN0000234197,
>> RESCAN0000234198, RESCAN0000234199, RESCAN0000234190,
>> RESCAN0000234191, RESCAN0000234192, RESCAN0000234070,
>> RESCAN0000234071, RESCAN0000234072, RESCAN0000234073,
>> RESCAN0000234074, RESCAN0000234075, RESCAN0000234076,
>> RESCAN0000234077, RESCAN0000234078, RESCAN0000234079,
>> RESCAN0000234081, RESCAN0000234082, RESCAN0000234080,
>> RESCAN0000234085, RESCAN0000234086, RESCAN0000234083,
>> RESCAN0000234084, RESCAN0000234089, RESCAN0000234087,
>> RESCAN0000234088, RESCAN0000234069, RESCAN0000234099,
>> RESCAN0000234098, RESCAN0000234095, RESCAN0000234094,
>> RESCAN0000234097, RESCAN0000234096, RESCAN0000234091,
>> RESCAN0000234090, RESCAN0000234093, RESCAN0000234092,
>> hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389]
>>
>> After that all is stuck. Ideas?
>>
>> On Mon, May 16, 2011 at 7:03 AM, Lars George <lars.george@gmail.com> wrote:
>>> Hi,
>>>
>>> I am on trunk and testing in pseudo distributed setup. I loaded the
>>> machine with YCSB and got it to break at a few million inserts during
>>> the load phase with the GC taking too long and the compaction queue
>>> going through the roof subsequently. Since then I cannot recover the
>>> local "cluster". It is stuck printing this:
>>>
>>> ...
>>> 2011-05-16 06:59:05,389 DEBUG
>>> org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired
>>> /hbase/splitlog/RESCAN0000148501 ver = 0
>>> 2011-05-16 06:59:06,388 DEBUG
>>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>>> unassigned = 1
>>> 2011-05-16 06:59:06,389 DEBUG
>>> org.apache.hadoop.hbase.master.SplitLogManager: resubmitting
>>> unassigned task(s) after timeout
>>> 2011-05-16 06:59:06,390 DEBUG
>>> org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired
>>> /hbase/splitlog/RESCAN0000148502 ver = 0
>>> 2011-05-16 06:59:07,388 DEBUG
>>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>>> unassigned = 1
>>> 2011-05-16 06:59:07,388 DEBUG
>>> org.apache.hadoop.hbase.master.SplitLogManager: resubmitting
>>> unassigned task(s) after timeout
>>> 2011-05-16 06:59:07,389 DEBUG
>>> org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired
>>> /hbase/splitlog/RESCAN0000148503 ver = 0
>>> 2011-05-16 06:59:08,388 DEBUG
>>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>>> unassigned = 1
>>> 2011-05-16 06:59:08,388 DEBUG
>>> org.apache.hadoop.hbase.master.SplitLogManager: resubmitting
>>> unassigned task(s) after timeout
>>> 2011-05-16 06:59:08,389 DEBUG
>>> org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired
>>> /hbase/splitlog/RESCAN0000148504 ver = 0
>>> 2011-05-16 06:59:09,388 DEBUG
>>> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1
>>> unassigned = 1
>>> 2011-05-16 06:59:09,389 DEBUG
>>> org.apache.hadoop.hbase.master.SplitLogManager: resubmitting
>>> unassigned task(s) after timeout
>>> 2011-05-16 06:59:09,390 DEBUG
>>> org.apache.hadoop.hbase.master.SplitLogManager: task not yet acquired
>>> /hbase/splitlog/RESCAN0000148505 ver = 0
>>> ...
>>>
>>> This keeps on going up and up. What is the right way to recover from
>>> this? Delete something from ZK? Delete something from HDFS? What shell
>>> commands would help?
>>>
>>> Thanks,
>>> Lars
>>>
>>
>

Mime
View raw message