hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Shvachko <...@yahoo-inc.com>
Subject Re: Repeated Exceptions in SecondaryNamenode Log
Date Tue, 01 Jul 2008 17:52:53 GMT
Which version of hadoop are you on?

Could you please take a look at your main name-node storage directory
and check whether the size of file current/fsimage is as expected compared
to previous images?

There was a bug (fixed in 0.16.3), which would create a bad image file
if there is a transfer error. So be careful.
http://issues.apache.org/jira/browse/HADOOP-3069

Thanks,
--Konstantin

Christian Saar wrote:
> Hallo again,
> 
> I have logged the network traffic with tcpdump and the secondarynamenode
> connects to the namenode!?:
> 
>> 18:40:39.323920 IP 172.20.11.102.53230 > 172.20.11.101.9000: S 2827286710:2827286710(0)
win 5840 <mss 1460,sackOK,timestamp 1045811384 0,nop,wscale 7>
>> 18:40:39.324021 IP 172.20.11.101.9000 > 172.20.11.102.53230: S 174737400:174737400(0)
ack 2827286711 win 5792 <mss 1460,sackOK,timestamp 123707 1045811384,nop,wscale 7>
>> 18:40:39.324030 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 1 win 46 <nop,nop,timestamp
1045811384 123707>
>> 18:40:39.324647 IP 172.20.11.102.53230 > 172.20.11.101.9000: P 1:168(167) ack
1 win 46 <nop,nop,timestamp 1045811384 123707>
>> 18:40:39.324747 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack 168 win 54
<nop,nop,timestamp 123708 1045811384>
>> 18:40:39.324756 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack 168 win 54
<nop,nop,timestamp 123708 1045811384>
>> 18:40:39.324769 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack 168 win 54
<nop,nop,timestamp 123708 1045811384>
>> 18:40:39.531873 IP 172.20.11.101.9000 > 172.20.11.102.53230: P 1:20(19) ack 168
win 54 <nop,nop,timestamp 123915 1045811384>
>> 18:40:39.531880 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 20 win 46 <nop,nop,timestamp
1045811592 123915>
>> 18:40:39.531901 IP 172.20.11.101.9000 > 172.20.11.102.53230: P 1:20(19) ack 168
win 54 <nop,nop,timestamp 123915 1045811384>
>> 18:40:39.531905 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 20 win 46 <nop,nop,timestamp
1045811592 123915,nop,nop,sack 1 {1:20}>
>> 18:40:39.531910 IP 172.20.11.101.9000 > 172.20.11.102.53230: P 1:20(19) ack 168
win 54 <nop,nop,timestamp 123915 1045811384>
>> 18:40:39.531914 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 20 win 46 <nop,nop,timestamp
1045811592 123915,nop,nop,sack 1 {1:20}>
>> 18:40:39.532245 IP 172.20.11.102.53230 > 172.20.11.101.9000: P 168:193(25) ack
20 win 46 <nop,nop,timestamp 1045811592 123915>
>> 18:40:39.532311 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack 193 win 54
<nop,nop,timestamp 123915 1045811592>
>> 18:40:39.532350 IP 172.20.11.101.9000 > 172.20.11.102.53230: . ack 193 win 54
<nop,nop,timestamp 123915 1045811592,nop,nop,sack 1 {168:193}>
>> 18:40:39.533398 IP 172.20.11.101.9000 > 172.20.11.102.53230: P 20:39(19) ack 193
win 54 <nop,nop,timestamp 123916 1045811592>
>> 18:40:39.573609 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 39 win 46 <nop,nop,timestamp
1045811633 123916>
>> 18:40:41.485795 IP 172.20.11.102.53230 > 172.20.11.101.9000: F 193:193(0) ack
39 win 46 <nop,nop,timestamp 1045813546 123916>
>> 18:40:41.485898 IP 172.20.11.101.9000 > 172.20.11.102.53230: F 39:39(0) ack 194
win 54 <nop,nop,timestamp 125869 1045813546>
>> 18:40:41.485905 IP 172.20.11.102.53230 > 172.20.11.101.9000: . ack 40 win 46 <nop,nop,timestamp
1045813546 125869>
> 
> 
> 
> 
> Christian Saar schrieb:
>> Hallo All,
>>
>> We have this Exception in our Logs:
>>
>>> 2008-07-01 17:12:02,392 ERROR org.apache.hadoop.dfs.NameNode.Secondary: Exception
in doCheckpoint:
>>> 2008-07-01 17:12:02,392 ERROR org.apache.hadoop.dfs.NameNode.Secondary: java.net.ConnectException:
Connection refused
>>>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>         at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>>>         at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193)
>>>         at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>>>         at java.net.Socket.connect(Socket.java:519)
>>>         at java.net.Socket.connect(Socket.java:469)
>>>         at sun.net.NetworkClient.doConnect(NetworkClient.java:157)
>>>         at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
>>>         at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
>>>         at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
>>>         at sun.net.www.http.HttpClient.New(HttpClient.java:306)
>>>         at sun.net.www.http.HttpClient.New(HttpClient.java:323)
>>>         at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:788)
>>>         at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:729)
>>>         at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:654)
>>>         at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:977)
>>>         at org.apache.hadoop.dfs.TransferFsImage.getFileClient(TransferFsImage.java:149)
>>>         at org.apache.hadoop.dfs.TransferFsImage.getFileClient(TransferFsImage.java:188)
>>>         at org.apache.hadoop.dfs.SecondaryNameNode.getFSImage(SecondaryNameNode.java:245)
>>>         at org.apache.hadoop.dfs.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:310)
>>>         at org.apache.hadoop.dfs.SecondaryNameNode.run(SecondaryNameNode.java:223)
>>>         at java.lang.Thread.run(Thread.java:619)
>> but the PrimaryNamenode looks good:
>>
>>> 2008-07-01 17:17:02,034 INFO org.apache.hadoop.fs.FSNamesystem: Roll Edit Log
from 172.20.11.102
>> anybody know how i can find the problem here?
>>
> 

Mime
View raw message