commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Cohen" <SCo...@sportvision.com>
Subject RE: [NET] Here's an Ant bug that we should look into fixing
Date Tue, 11 Mar 2003 13:16:44 GMT
I think the regexes have to be specified in code rather than a properties file, for a couple
of reasons.  

1) They are simply too complicated for many if not most users to create.  

2) A regex is insufficient to do the full parsing job - a regex can tell you "look for a string
of 3-7 uppercase letters here" or "look for 'RWED' or 'RWD' there" or "look for a number there"
or whatnot, but it can't tell you what part of the file spec that group might belong to. 
That's the job of code.  Given that the regexes are complicated, it makes a lot of sense to
hardcode them and place them in code with their interpreters.

The only way I can see to get beyond this is to devise a "language" - perhaps built on top
of regexes - something along the lines of the SimpleDateFormat "language".  Such a language
goes a step further than regexes by mapping patterns to meanings.  But parsing a file listing
is a much more complicated job than parsing a date, and not necessarily something I want to
tackle in my spare time.

Somewhat short of this goal might be a properties file that linked particular FTP sites to
parsers and date formats.

-------------------------------------------------

As far as my less ambitious goal of parsing dates correctly on different locale systems is
concerned, I posted a request on the Ant list for any sample FTP sites implemented in different
languages and so far have not received any replies.

I'll make a similar request here.  If anyone has the addresses of publicly accessible ftp
sites implemented in languages other than English, please pass them on to me.  

I am guessing, though, that a lot of the more popular public sites are implemented in English.
 As an experiment, I tried to access ftp.suse.de and found that to be the case.  My uneducated
guess at this point is that ftp servers in other languages are more likely to be found on
private corporate sites that don't allow anonymous access than on public sites.

-----Original Message-----
From:	Jeffrey D. Brekke [mailto:jbrekke@wi.rr.com]
Sent:	Mon 3/10/2003 11:04 PM
To:	Jakarta Commons Developers List
Cc:	
Subject:	Re: [NET] Here's an Ant bug that we should look into fixing

Steve,

I'm listening and hopefully will get more time to work on Net stuff
soon.  So while I can't commit to work on implementing these ideas,
they sound fine and I can still run tests, generate site, and commit
patches.

As I was reading this I remember sometime an idea where we could
specify the regular expression used for a system in a properties file
for something and have a generic parser that would look up the correct
RE.  This could then be configured outside the code itself as new
systems are encountered.  Maybe something like this could also be used
to handle date formatting?

jb

>>>>> On Sun, 9 Mar 2003 14:18:18 -0600, "Steve Cohen" <SCohen@sportvision.com>
said:

> I had thought I might hear some replies to this.  The silence has
> been deafening.  I have been thinking about the issue, though, in
> particular where commons-net.ftp might have to go in order to really
> implement the ambitious spec laid out for it by clients such as ant,
> which have chosen to use it.

> Of particular note here is the "depends" (or synonym "newer")
> attribute of the ant <ftp> task.  This runs aground on the issue of
> parsing the date.  In the first place, there are the issues of
> general listing format (unix, NT, VMS, etc.).  In the second place,
> though, within these categories are issues of date format.  This
> devolves into a thicket of locale-type issues:

> Does month come before date?  In which language are the names of the
> months coded?

> To solve this, the scope of parser definition needs to be
> significantly expanded.

> Things might be better if there was any mechanism within the FTP
> specification for the server to expose its format to a client.  No
> such mechanism exists, however.  In fact RFC959, the FTP spec is
> intentionally vague on this point:

> "Since the information on a file may vary widely from system to
> system, this information may be hard to use automatically in a
> program, but may be quite useful to a human user."

> http://www.ietf.org/rfc/rfc959.txt

> In other words, FTP was never meant to be used in such an automated
> fashion.

> Nonetheless, with the specification of parameters easily passed in
> by something like an ant task, it might be possible to define a
> parser sufficiently to perform this task.  These parameters include:

> 1) os type of FTP server(unix, NT, OS2, VMS, etc.)  2) date format -
> to define ordering of date components - "MMM dd" or "dd MMM",
> etc. as in simple date format 3) locale - to define actual
> abbreviations of the months.

>> From 2 and 3 it is possible to build a Locale-specific
>> SimpleDateFormat
> capable of parsing dates on a particular system.  This object
> contains the names and abbreviations of the month.

> This immediately raises the question of how to divvy up the parsing
> duties between the regular expression and the SimpleDateFormat.  It
> seems as if the format string must be used to construct the part of
> the regex in the correct order.  Then the SimpleDateFormat would be
> used to actually parse the date.  All "optimizations" such as
> assuming a constant character width of 3 for month abbreviations are
> out the window here - they work for many languages, but not for all.
> French, for example, uses periods and varying lengths.

> A cautionary note: one would have to inspect actual ftp sites to
> determine whether they actually the abbreviations specified in java
> Locales.

> Comments?  Is this a Pandora's box that we don't want to open?






> -----Original Message----- From: Steve Cohen Sent: Wed 3/5/2003 1:53
> PM To: Jakarta Commons Developers List Cc: Subject: [NET] Here's an
> Ant bug that we should look into fixing

> The <ftp> task of ant doesn't work right because we don't parse
> non-english date formats.

> http://nagoya.apache.org/bugzilla/show_bug.cgi?id=14333
> ----------------------------------------------- Steve Cohen
> Sr. Software Engineer Sportvision Inc.  scohen@sportvision.com
> http://www.sportvision.com

> Please note: As a result of the merger of Ignite Sports and
> Sportvision, my email address has changed to scohen@sportvision.com



-- 
=====================================================================
Jeffrey D. Brekke                                   jbrekke@wi.rr.com
Wisconsin,  USA                                     brekke@apache.org
                                                    ekkerbj@yahoo.com


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message