commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Cliff (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (NET-160) FTP Client API Hangs - Attached Scenario.
Date Mon, 15 Dec 2008 18:19:44 GMT

    [ https://issues.apache.org/jira/browse/NET-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656669#action_12656669
] 

themusicgod1 edited comment on NET-160 at 12/15/08 10:17 AM:
-----------------------------------------------------------

There may be multiple points where hanging can occur, but it seems like a lot of them share
a common characteristic, described somewhat by Guy Rouillier in this thread.

For some of these hangs it looks like from within copyStream(), the InputStream source is
blocking and not being refilled. The fact that it's blocking at at java.net.SocketInputStream.socketRead0(Native
Method) is just a symptom of this level's not knowing that the InputStream is not available
for reading. It's easy to simulate, too; merely unplug your network cable during a read. A
connect() will rightly detect that there's nothing there to connect to, but while read(),
95% of the time seems to block indefinitely, even when setSoTimeout() and other set*Timeout()s
are used.

As far as copyStream() is concerned, You can actually detect when it blocks, but there's a
problem with that in that it sometimes blocks and then waits, and then after a short period(ms)
source is refilled. One way to get around this in realistic situations is to have another
Timeout\*\*\*; this one from within Util itself, or probably better yet, encapsulate Util
in a FTPClient specific Util that has this extra bit of baggage for these situations\*. It
seems to work\*\*, but only for some aspects of this problem\*\*\*\*. It'll at least give
you an CopyStreamException, which for critical applications is a lot better than infinite
waits on a nonexistent input streams caused by noncritical network error.

Also notable for Sebb, who suggested using PASV mode:
Thread [main] (Suspended)
SocketInputStream.socketRead0(FileDescriptor, byte[], int, int, int) line: not available [native
method]
SocketInputStream.read(byte[], int, int) line: 129
StreamDecoder.readBytes() line: 264 [local variables unavailable]
StreamDecoder.implRead(char[], int, int) line: 306
StreamDecoder.read(char[], int, int) line: 158
InputStreamReader.read(char[], int, int) line: 167
BufferedReader.fill() line: 136 [local variables unavailable]
BufferedReader.readLine(boolean) line: 299
BufferedReader.readLine() line: 362 [local variables unavailable]
FTPClient(FTP).__getReply() line: 294
FTPClient(FTP).sendCommand(String, String) line: 490
FTPClient(FTP).sendCommand(int, String) line: 534
FTPClient(FTP).sendCommand(int) line: 583
FTPClient(FTP).pasv() line: 882
FTPClient._openDataConnection_(int, String) line: 497
FTPClient.retrieveFile(String, OutputStream) line: 1276
FTPStation.getZile(String, String, int, String, String, int) line: 160

ie...these hangs can occur *going into* PASV mode. Same root cause, I suspect: __getReply()
is being called and its result is not being given to it, so it waits. Forever. Instead of
at least throwing an Exception.

To perhaps rephrase(after a weekend to think about it); what seems to be happening, all over
the FTPClient code, is any time a read is made to a stream before it is filled ends up being
an infinite block. And this can happen. For example, in FTPClient's __getReply(). If a call
is made to __getReply() before the server actually replies, __getReply() blocks indefinitely.
Even if the server actually does reply a millisecond or so after the indefinite blocking begins.

*edit*: oh I might want to point out that I'm on ubuntu 8.10/sun java 5-6 (u10)/filezilla
server/passive mode

\* I'm a fairly new java programmer so maybe this is not optimal.
\*\* I'll attatch my updated Util class to this thread.
\*\*\* Someone pointed out that this is basically making setSoTimeout work. Not sure what
to do about this
\*\*\*\* albeit the most common, the actual downloading of files. Uploading/connecting/disconnecting/other
not really impacted



      was (Author: themusicgod1):
    There may be multiple points where hanging can occur, but it seems like a lot of them
share a common characteristic, described somewhat by Guy Rouillier in this thread.

For some of these hangs it looks like from within copyStream(), the InputStream source is
blocking and not being refilled. The fact that it's blocking at at java.net.SocketInputStream.socketRead0(Native
Method) is just a symptom of this level's not knowing that the InputStream is not available
for reading. It's easy to simulate, too; merely unplug your network cable during a read. A
connect() will rightly detect that there's nothing there to connect to, but while read(),
95% of the time seems to block indefinitely, even when setSoTimeout() and other set*Timeout()s
are used.

As far as copyStream() is concerned, You can actually detect when it blocks, but there's a
problem with that in that it sometimes blocks and then waits, and then after a short period(ms)
source is refilled. One way to get around this in realistic situations is to have another
Timeout\*\*\*; this one from within Util itself, or probably better yet, encapsulate Util
in a FTPClient specific Util that has this extra bit of baggage for these situations\*. It
seems to work\*\*, but only for some aspects of this problem\*\*\*\*. It'll at least give
you an CopyStreamException, which for critical applications is a lot better than infinite
waits on a nonexistent input streams caused by noncritical network error.

Also notable for Sebb, who suggested using PASV mode:
Thread [main] (Suspended)
SocketInputStream.socketRead0(FileDescriptor, byte[], int, int, int) line: not available [native
method]
SocketInputStream.read(byte[], int, int) line: 129
StreamDecoder.readBytes() line: 264 [local variables unavailable]
StreamDecoder.implRead(char[], int, int) line: 306
StreamDecoder.read(char[], int, int) line: 158
InputStreamReader.read(char[], int, int) line: 167
BufferedReader.fill() line: 136 [local variables unavailable]
BufferedReader.readLine(boolean) line: 299
BufferedReader.readLine() line: 362 [local variables unavailable]
FTPClient(FTP).__getReply() line: 294
FTPClient(FTP).sendCommand(String, String) line: 490
FTPClient(FTP).sendCommand(int, String) line: 534
FTPClient(FTP).sendCommand(int) line: 583
FTPClient(FTP).pasv() line: 882
FTPClient._openDataConnection_(int, String) line: 497
FTPClient.retrieveFile(String, OutputStream) line: 1276
FTPStation.getZile(String, String, int, String, String, int) line: 160

ie...these hangs can occur *going into* PASV mode. Same root cause, I suspect: __getReply()
is being called and its result is not being given to it, so it waits. Forever. Instead of
at least throwing an Exception.

To perhaps rephrase(after a weekend to think about it); what seems to be happening, all over
the FTPClient code, is any time a read is made to a stream before it is filled ends up being
an infinite block. And this can happen. For example, in FTPClient's __getReply(). If a call
is made to __getReply() before the server actually replies, __getReply() blocks indefinitely.
Even if the server actually does reply a millisecond or so after the indefinite blocking begins.

*edit*: oh I might want to point out that I'm on ubuntu 8.10/sun java 6 (u10)/filezilla server/passive
mode

\* I'm a fairly new java programmer so maybe this is not optimal.
\*\* I'll attatch my updated Util class to this thread.
\*\*\* Someone pointed out that this is basically making setSoTimeout work. Not sure what
to do about this
\*\*\*\* albeit the most common, the actual downloading of files. Uploading/connecting/disconnecting/other
not really impacted


  
> FTP Client API Hangs - Attached Scenario.
> -----------------------------------------
>
>                 Key: NET-160
>                 URL: https://issues.apache.org/jira/browse/NET-160
>             Project: Commons Net
>          Issue Type: Bug
>    Affects Versions: 1.4
>         Environment: Suse Enterprise Linux 9.0, Intel
>            Reporter: Lvenkataraman
>         Attachments: Util.java
>
>
> FTP client hangs with the following error.   Here is the truss output. The client hangs
indefinitely and no work is being done other than this error when you truss the process. 
I use Java 1.5 to run this.  We wrote a long-running ftp daemon and this happens after 20-23
hours of continuous running.
> Here is the code to pull files from the ftp server. This one lists files that are received
after this particular "incal" time (currently it lists but the intention is to implement an
incremental download).  We initially thought doing periodic noop would keep the client alive
by sending dummy commands to the server. But we found this probelm still occurs.
> public void ListIncrementalFiles (String directory, Calendar incal)
> 	{
> 		try 
> 		{
> 			FTPFile[] files = listFiles(directory);
> 			
> 			int total_files = 0;
> 					
> 			System.out.println("FileName\tFileSize\t\tDate Time Stamp");
> 			for (int i = 0; i < files.length; i++)
> 			{
> 												
> 					FTPFile file = files[i];
> 					if ((i % 1000) == 0)
> 					{
> 						System.out.println("noop occurred at this instance---> " + i + "\n");
> 						noop();
> 					}
> 							
> 					if (file.isFile())
> 					{
> 						Calendar c = file.getTimestamp();
> 												
> 						if (c.after(incal) || c.equals(incal))
> 						{
> 								
> 								System.out.println(file.getName() + "\t" + file.getSize() + "\t\t" + c.getTime());
> 								total_files++;
> 						}
> 					 }
>     				
> 			}
> 			System.out.println("Total Files in the listing ----> " + total_files);
> 		} 
> 		catch (Exception e)
> 		{
> 			System.out.println("FTP Getfiles exception occured");
> 			e.printStackTrace();
> 		}
>     
> 	}
> ---------------------------------------------------------------------------------------------------------------------------------Truss
output begins--------------------------------------------------------
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /6:	lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /9:	lwp_park(0x00000000, 0)		(sleeping...)
> /3:	lwp_cond_wait(0x000B77C0, 0x000B77A8, 0x00000000, 0) (sleeping...)
> /8:	lwp_cond_wait(0x0014A908, 0x0014A8F0, 0x00000000, 0) (sleeping...)
> /10:	lwp_cond_wait(0x000366E8, 0x000366D0, 0x00000000, 0) (sleeping...)
> /1:	pollsys(0x00000000, 0, 0xFFBFEB60, 0x00000000) (sleeping...)
> /11:	lwp_cond_wait(0x00036778, 0x00036760, 0x00000000, 0) (sleeping...)
> /4:	lwp_cond_wait(0x000B77C0, 0x000B77A8, 0x00000000, 0) (sleeping...)
> /12:	lwp_cond_wait(0x00036778, 0x00036760, 0x00000000, 0) (sleeping...)
> /2:	lwp_cond_wait(0x000B77C0, 0x000B77A8, 0x00000000, 0) (sleeping...)
> /7:	lwp_cond_wait(0x00149F98, 0x00149F80, 0x00000000, 0) (sleeping...)
> /5:	lwp_cond_wait(0x000B77C0, 0x000B77A8, 0x00000000, 0) (sleeping...)
> /13:	lwp_cond_wait(0x00034958, 0x00034940, 0x00000000, 0) (sleeping...)
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /6:	lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /6:	lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /6:	lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /6:	lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /6:	lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /6:	lwp_cond_wait(0x00035F08, 0x00035EF0, 0xB477FBE0, 0) Err#62 ETIME
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> /14:	pollsys(0x00000000, 0, 0xB3F7FC10, 0x00000000)	= 0
> ------------------------------------------------------------------------------------------Truss
output ends-------------------------------------------------------------------------------------------

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message