hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Akira AJISAKA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
Date Wed, 04 Feb 2015 05:37:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14304666#comment-14304666
] 

Akira AJISAKA commented on HDFS-7175:
-------------------------------------

Tried tcpdump with JDK8. The channel was quiet without -showprogress option.
bq. If this sounds fine, I can work on a patch to do this. I am also fine if Akira wants to
work on the patch, or has alternative solutions.
Yeah, you can work on a patch :) One comment:
bq. Change the server to disregard the showprogress option, and send out dots every N (=10)
seconds no matter what.
I want to reduce network load, so would you send a dot per 100 files if -showprogress option
is not specified? If you scan 1G files, the server will send extra 1GB to client.

> Client-side SocketTimeoutException during Fsck
> ----------------------------------------------
>
>                 Key: HDFS-7175
>                 URL: https://issues.apache.org/jira/browse/HDFS-7175
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Carl Steinbach
>            Assignee: Akira AJISAKA
>         Attachments: HDFS-7175.2.patch, HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch
>
>
> HDFS-2538 disabled status reporting for the fsck command (it can optionally be enabled
with the -showprogress option). We have observed that without status reporting the client
will abort with read timeout:
> {noformat}
> [hdfs@lva1-hcl0030 ~]$ hdfs fsck / 
> Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070
> 14/09/30 06:03:41 WARN security.UserGroupInformation: PriviledgedActionException as:hdfs@GRID.LINKEDIN.COM
(auth:KERBEROS) cause:java.net.SocketTimeoutException: Read timed out
> Exception in thread "main" java.net.SocketTimeoutException: Read timed out
> 	at java.net.SocketInputStream.socketRead0(Native Method)
> 	at java.net.SocketInputStream.read(SocketInputStream.java:152)
> 	at java.net.SocketInputStream.read(SocketInputStream.java:122)
> 	at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> 	at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
> 	at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> 	at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
> 	at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
> 	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
> 	at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312)
> 	at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
> 	at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149)
> 	at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 	at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> 	at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346)
> {noformat}
> Since there's nothing for the client to read it will abort if the time required to complete
the fsck operation is longer than the client's read timeout setting.
> I can think of a couple ways to fix this:
> # Set an infinite read timeout on the client side (not a good idea!).
> # Have the server-side write (and flush) zeros to the wire and instruct the client to
ignore these characters instead of echoing them.
> # It's possible that flushing an empty buffer on the server-side will trigger an HTTP
response with a zero length payload. This may be enough to keep the client from hanging up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message