commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Cohen <sco...@javactivity.org>
Subject [NET] Designing a Date Format-aware FTP Entry Parser
Date Sat, 25 Sep 2004 15:19:55 GMT
Designing a Date Format-aware FTP Entry Parser

After having percolated on the back burner for several years as an unresolved 
issue, there is finally some momentum toward solving the problem of parsing 
FTP entries from servers which format the file timestamps in the directory 
listings in a format other than the NetComponents “standard”. 

In order to understand what must be done, it would be helpful to understand 
what we now do.  In brief, we are using a regular expression to achieve 
basically the same results as attempting to parse the date portion of the 
listing with one of two alternate java.text.SimpleDateFormats in the en_US 
locale:
1.MMM dd HH:mm for dates within one year of the current time
2.MMM dd yyyy for dates older than one year.

Additionally, these formats presume some timezone, which is either the local 
timezone of the server or GMT, I presume.

The alternative mechanism that I am proposing would remove the parsing of the 
timestamp from the responsibilities of the regular expression and unload this 
onto some other object. 

But what object?  The obvious candidate would be java.text.DateFormat.  This 
abstract class allows a formatter object to be created on the basis of some 
formatting codes defined in DateFormat (“LONG, MEDIUM, SHORT”) and a Locale.  
But this is problematic because what is meant by MEDIUM in en_US is a string 
like “Sep 25, 2004” while in “de_DE”, you get a string like “25.09.2004”.  
This just won't do.  So we have to fall back on java.text.SimpleDateFormat, 
passing in both a specific formatting string and a Locale, which provides the 
month names, etc.  (By the way, has anyone ever noticed that SimpleDateFormat 
is actually less simple than DateFormat?) :-)

The regular expression would merely extract from the listing the entire 
timestamp portion and delegate the task of parsing it to a pair of  
SimpleDateFormat objects (one for less than 1 year old and the other for one 
year old or older), each constructed on the basis of a format string and a 
locale.  Since the Locale should be the same for both formats, we would 
require the user to provide the two format Strings, and the Locale (or 
possibly the constituent elements of the locale, the country code and 
language code).  We want an object that encapsulates all of that, say,
org.apache.commons.net.ftp.parser.FTPDateFormat.

So each parser would have a settable member of this class   FTPDateFormat 
would be constructed from two format strings and a Locale.  Possibly a 
timezone as well.  We probably would have to provide some default 
FTPDateFormat objects for some of the common locales.

One consequence of this is that we would start making heavier use of the 
FTPFileEntryParserFactory objects.  We might want to start thinking about 
deemphasizing but not deprecating the use of FTPClient.listFiles() which is 
simple but makes too many assumptions.  There are already four or five 
different overrides of this method name and adding several more parameters 
into the mix will make this completely unworkable.  Instead, going through 
the factory would become the more common, more documented and recommended 
approach.  This would be the preferred method of accessing commons-net ftp 
for clients such as Ant and VFS.  Users who are happily using listFiles() in 
its current form in their custom apps built directly from commons-net could 
continue to do so.

Well, these are some preliminary thoughts.  Let's hear from the other 
developers of this project.


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message