commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mario Ivankovits <>
Subject Re: [NET] Designing a Date Format-aware FTP Entry Parser
Date Thu, 30 Sep 2004 10:39:11 GMT
Steve Cohen wrote:

 >First parser to successfully parse every
 >item in the listing I suppose.  Or do you go for best score?

line-by-line - the first parser which is able to parse should be cached 
(ad performance) - that way it might be only slower on the first match
However the parser should be prepared to redetect the language as soon 
as it fails at a later time - maybe there
are minor differences between languages and the first detection wasnt 
correct -
e.g. Mar  (March) might not be uniq if you talk to a german ftp server 
wich do not use umlauts (März => Mar)

 >What if none of
 >the parsers in your composite works?  Then what?

Like now - a "null" entry in the list of returned entries.
Or we change the paradigm NET uses today and throw an exception - but 
this is worth a thread on its own ;-)

 >2) Will we be opening ourselves to arguments as to which languages are 
 >the composite?  Or in which order?  If you're using Italian and it has 
to try
 >US English, British English, French and German first, your performance is
 >going to be lousy.  Which brings me to

Is there a difference between US and British?

Performance: As i said - we could cache the last matching language - 
then only the first search might be slow.

Such a composite might only fail if one have to use croatic and polnish 
language at once. There the names "lis" and "lip" means different
months. (at least of the point of "java short names" view)
This is why i am not against your solution at all, the composite parser 
should only be one additional possiblity - and IMHO the default parser.

I think this composite could be configureable by a static map (system 
wide). There I would like to configure it
to detect "US", "DE", "FR" (in this order) and i am fine with 100% of 
all ftp server i have to contact today.
In the case of ant it could be configured by e.g lang="US,DE,FR"
Or by a system property, .... or .... we could discuss this if we found 
a consens at all.

And we should also discuss that you dont want to take SYST into account 
- or at least the possiblity to do so, but this depends also for which
file entry parsers you would like to implement the date stuff. Currently 
I am only aware the fact the unices to this language stuff.

 >3) This is too much run-time trial and error for my tastes.  The 
average user
 >of our library is not writing the ultimate FTP client.  He is writing 
a java
 >app or Ant script to connect repeatedly to an FTP server somewhere.  
Once he
 >gets the right parser, he never has need of trying others for that server.

... or using VFS. And VFS would like the be the super ftp, ssh, .... client.
Like a filesystem works - the user dont want to be bothered with things 
like date styles.

For sure, I am not fully against the solution you have in mind, i just 
would like to ensure it is posssible to pass
in a parser which uses a completley different strategy.
And again: The user do not have to choose a file-entry-parser now - is 
is done automatically by SYST (i know you know ;-)) -
but now we force him to select the correct date format - today if he 
changes the url (and a appropriate parser
is available) the file parsing works without any additional attention.

Maybe we would provide a parser with a TreeMap where all month names and 
their numbers are stored - the community could
help to fill this map - or a properties file which could easily be changed.

 >4) On the other hand, your idea could be the basis of a pretty cool 
tool based
 >on NetComponents: point it at an FTP server somewhere, let it try all the
 >tricks it knows, and somehow it returns its best guess as to what 
parser and
 >parser date format to use for that server.

Thats the point - like to comfort we provide with the automatich 
detection of the needet file-entry-parser.
Computers should work for humans and not humans for computers ;-)

As i tried to say earlier: Today the parsing works pretty well - we do 
have problems only with the month
name (and unknown servers). As long as the date parts are not in 
different order (based on the language)
why implement such a drastic change in the comfort NET provides today - 
A black box where the user passes
in an url and gets a file listing is what the user really wants.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message