commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Cohen <>
Subject Re: [net] FTP client date parsing: new format
Date Sat, 16 Apr 2005 14:55:14 GMT
Okay, we've solved the immediate issues here but I'm not totally 
satisfied yet.  The problem is that the numeric date format has 
introduced a new logical possibility.  Formerly it was simple and clear 
- either with the default or recent date formats there were always THREE 
whitespace-separated components of the date (month day year OR month day 
time).  The newly-introduced numeric date format in unix ftp servers 
(about time, by the way) adds a new possibility of a timestamp composed 
of TWO whitespace-separated components.  But until this becomes 
widespread and probably forever, we'll have to maintain backward 
compatibility with the older non-numeric formats.

Making the third token optional is reasonable, but as we have seen, 
Neeme's find of the symbolic link case defeats this simple attempt at a 
fix.  With the two-or-three token date, it is possible that the regex 
engine will find an extra token later on and screw up the logic.

My current solution relies on the fact that a unix filename is not 
supposed to start with a hyphen.  ([^-\\s]\\S*)  So "->", the symlink 
indicator, will not be mistaken for a filename.  But that still doesn't 
feel solid enough.

I would feel better if we had a more solid regex that clearly captured 
what is and what is not a legal unix filename.  Googling did not find an 
immediate answer to this questions, nor did I find one in Jeffrey 
Friedl's "Mastering Regular Expressions" book.  Does anyone have one?

Steve Cohen wrote:
> Sorry for being a bit brusque before but if you check out the latest 
> code I think you will find that with Rory's and my changes, your issues 
> are cared for.
> Neeme Praks wrote:
>> ok, now I checked out the recent changes and the fix seems to work, at 
>> least in the case of usual files:
>> -rw-r-----   1 neeme neeme   346 2005-04-08 11:22 services.vsp
>> is parsed into:
>>    typeStr=-
>>    hardLinkCount=1
>>    usr=neeme
>>    grp=neeme
>>    filesize=346
>>    datestr=2005-04-08 11:22
>>    name=services.vsp
>>    endtoken=
>> And this is correct.
>> However, it still breaks in the case of symbolic links.
>> So, if the entry is a symbolic link:
>> lrwxrwxrwx   1 neeme neeme    23 2005-03-02 18:06 macros -> 
>> ./../../global/macros/.
>> then it is parsed into these variables:
>>   typeStr=l
>>   hardLinkCount=1
>>   usr=neeme
>>   grp=neeme
>>   filesize=23
>>   datestr=2005-03-02 18:06 macros
>>   name=->
>>   endtoken= ../../../global/macros/
>> The ending of "-> ../../../global/macros/" seems to confuse the regexp 
>> parser.
>> And to answer Rorys question about the specifics of the FTP server, 
>> I'll paste one of my earlier posts here:
>> This format is from the default FTP server daemon configuration that 
>> came with Debian:
>> Connected to stf.
>> 220 stf FTP server (Version 6.4/OpenBSD/Linux-ftpd-0.17) ready.
>> Name (stf:neeme): neeme
>> 331 Password required for neeme.
>> Password:
>> 230- Linux stf 2.6.11 #1 SMP Wed Mar 2 14:08:21 CET 2005 i686 GNU/Linux
>> 230-
>> 230- The programs included with the Debian GNU/Linux system are free 
>> software;
>> 230- the exact distribution terms for each program are described in the
>> 230- individual files in /usr/share/doc/*/copyright.
>> 230-
>> 230- Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
>> 230- permitted by applicable law.
>> 230 User neeme logged in.
>> Remote system type is UNIX.
>> Using binary mode to transfer files.
>> ftp>
>> Rgds,
>> Neeme
>> Neeme Praks wrote:
>>> AFAIK, the new system uses both: regexp for extracting the timestamp 
>>> from the entry line and then using DateFormat to parse that.
>>> Example:
>>> -rw-r--r--    1 1000     1000           27 Jan 24 11:31 messages.vsp
>>> from this line the regexp extracts the timestamp part ("Jan 24 
>>> 11:31") and then DateFormat is used to parse this to a valid Date 
>>> object.
>>> The issue here is that the failure is already at regexp matching, and 
>>> the code never reaches the DateFormat parsing part.
>>> I'll try to check out Rory's changes during the weekend.
>>> Rgds,
>>> Neeme
>>> Steve Cohen wrote:
>>>> No, that's not it at all.  Remember that the new system does not use 
>>>> Regexes for date parsing, it uses SimpleDateFormats.  From Mr. 
>>>> Praks' descriptions, I am assuming he's now running the 1.3 or 
>>>> earlier versions, which do use regexes.
>>>> This surprises me because I've had several conversations with him 
>>>> over the past month in which the new system was discussed.  Perhaps 
>>>> he forgot to specify the date format as "yyyy/MM/dd" in his 
>>>> FTPClientConfig this time?  Or perhaps his code is finding an older 
>>>> commons-net.jar than he was expecting?
>>>> Steve Cohen
>>>> Rory Winston wrote:
>>>>> Right, the problem with this format is that the date is not 
>>>>> composed of three discrete components (from a regex POV), but two. 
>>>>> Basically what we will need to do is expand the regex to handle 
>>>>> thuis - can you give me details of the FTP server operating system 
>>>>> and FTP server software version if you have it please.
>>>>> Cheers
>>>>> Rory
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message