hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Das <rahul.h...@gmail.com>
Subject Re: Hadoop Namenode problem
Date Thu, 28 Jul 2011 12:06:43 GMT
Hi Joey,

The log is too big to attach into mail. What I found that there is no error
during this time.
Only few Warnings are coming like

2011-07-21 14:13:47,814 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
PendingReplicationMonitor timed out block blk_-6058282241824946206_13375223
...
...
2011-07-21 14:30:49,511 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Inconsistent size for
block blk_8615896953045629213_15838442 reported from xx.xx.xx.xx:50010
current size is 1950720 reported size is 2448907

I think the edit file size was too huge thats why it took long time.

Regards,
Rahul

On Fri, Jul 22, 2011 at 9:33 PM, Joey Echeverria <joey@cloudera.com> wrote:

> The long startup time after the restart looks like it was caused because
> the SecondaryNameNode hasn't been able to roll the edits log for some time.
> Can you post your Namenode log from around the same time in this
> SecondaryNameNode log (2011-07-21 16:00-16:30)?
>
> -Joey
>
>
> On Fri, Jul 22, 2011 at 8:29 AM, Rahul Das <rahul.hdpq@gmail.com> wrote:
>
>> Yes I have a secondary Namenode running. Here are the log for
>> SecondaryNamenode
>>
>> 2011-07-21 16:02:47,908 INFO org.apache.hadoop.hdfs.server.common.Storage:
>> Edits file /home/hadoop/tmp/dfs/namesecondary/current/edits of size 12751835
>> edits # 138217 loaded in 1581 seconds.
>> 2011-07-21 16:03:21,925 INFO org.apache.hadoop.hdfs.server.common.Storage:
>> Image file of size 2045516451 saved in 29 seconds.
>> 2011-07-21 16:03:24,974 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions:
>> 0 Total time for transactions(ms): 0Number of transactions batched in Syncs:
>> 0 Number of syncs: 0 SyncTimes(ms): 0
>> 2011-07-21 16:03:25,545 INFO
>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL
>> xx.xx.xx.xx:50070putimage=1&port=50090&machine=xx.xx.xx.xx&token=-18:1554828842:0:1311242583000:1311240481442
>> 2011-07-21 16:29:24,356 ERROR
>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
>> doCheckpoint:
>> 2011-07-21 16:29:24,358 ERROR
>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
>> java.io.IOException: Call to xx.xx.xx.xx:9000 failed on local exception:
>> java.io.IOException: Connection reset by peer
>>
>> Regards,
>> Rahul
>>
>>
>> On Fri, Jul 22, 2011 at 5:40 PM, Joey Echeverria <joey@cloudera.com>wrote:
>>
>>> Do you have an instance of the SecondaryNamenode in your cluster?
>>>
>>> -Joey
>>>
>>>
>>> On Fri, Jul 22, 2011 at 3:15 AM, Rahul Das <rahul.hdpq@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am running a Hadoop cluster with 20 Data node. Yesterday I found that
>>>> the Namenode was not responding ( No write/read to HDFS is happening). It
>>>> got stuck for few hours, then I shut down the Namenode and found the
>>>> following error from the Name node log.
>>>>
>>>> 2011-07-21 16:15:31,500 WARN org.apache.hadoop.ipc.Server: IPC Server
>>>> Responder, call
>>>> getProtocolVersion(org.apache.hadoop.hdfs.protocol.ClientProtocol, 41) from
>>>> xx.xx.xx.xx:13568: output error
>>>>
>>>> This error was coming for every data node and data nodes are not able to
>>>> communicate with the Name node
>>>>
>>>> After I restart the Namenode
>>>>
>>>> 2011-07-21 16:31:54,110 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
>>>> 2011-07-21 16:31:54,216 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
>>>> Initializing RPC Metrics with hostName=NameNode, port=9000
>>>> 2011-07-21 16:31:54,223 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
>>>> xx.xx.xx.xx:9000
>>>> 2011-07-21 16:31:54,225 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>>>> Initializing JVM Metrics with processName=NameNode, sessionId=null
>>>> 2011-07-21 16:31:54,226 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
>>>> NameNodeMeterics using context
>>>> object:org.apache.hadoop.metrics.spi.NullContext
>>>> 2011-07-21 16:31:54,280 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
>>>> 2011-07-21 16:31:54,280 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
>>>> 2011-07-21 16:31:54,280 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>>> isPermissionEnabled=false
>>>> 2011-07-21 16:31:54,287 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
>>>> Initializing FSNamesystemMetrics using context
>>>> object:org.apache.hadoop.metrics.spi.NullContext
>>>> 2011-07-21 16:31:54,289 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>>>> FSNamesystemStatusMBean
>>>> 2011-07-21 16:31:54,880 INFO
>>>> org.apache.hadoop.hdfs.server.common.Storage: Number of files = 15817482
>>>> 2011-07-21 16:34:38,463 INFO
>>>> org.apache.hadoop.hdfs.server.common.Storage: Number of files under
>>>> construction = 82
>>>> 2011-07-21 16:34:41,177 INFO
>>>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size
>>>> 2042701824 loaded in 166 seconds.
>>>> 2011-07-21 16:58:07,624 INFO
>>>> org.apache.hadoop.hdfs.server.common.Storage: Edits file
>>>> /home/hadoop/current/edits of size 12751835 edits # 138217 loaded in 1406
>>>> seconds.
>>>>
>>>> And it goes for a long halt. After about an hour it starts working
>>>> again.
>>>>
>>>> My question is when the error "IPC Server Responde" comes and is there a
>>>> way to deal with it.
>>>> Also if my Namenode is busy doing something then what is the way to find
>>>> out what it is doing.
>>>>
>>>> Regards,
>>>> Rahul
>>>
>>>
>>>
>>>
>>> --
>>> Joseph Echeverria
>>> Cloudera, Inc.
>>> 443.305.9434
>>>
>>>
>>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>
>

Mime
View raw message