commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Cohen" <SCo...@sportvision.com>
Subject RE: [NET] ftp entry spans lines
Date Sat, 01 Mar 2003 11:54:23 GMT
If you are going to include the entry ending in the same regular expression that you use to
break the entries down you either have to 

1) parse the whole thing as you are breaking apart the entries, essentially doing away with
the idea of not creating the FTPFile entries right away
OR
2) parse it twice, on the first pass not attempting to break it down further than the entry
delimiter, and then parsing it fully on the second pass. 

Since the regular expression is undoubtedly much more complicated than the first pass requires,
the problem with 2) is that it might be very inefficient.

I see two alternatives here:
3) Make each parser have TWO regular expressions, a simpler one for delimiting entries and
the current one for breaking them down into FTP Files.

4) Use a non-regular expression method for delimiting entries probably on the java.io level,
similar to what readLine() does.  

However neither of these would work in the VMS case you describe if there is no delimiter
that can be looked ahead for, that is, if the delimiting is done solely by context using the
"algorithm" that "the next entry starts when the last entry is finished".  Is that the case
or is there some sort of delimiter?

The readLine() method of delimiting entries was something I copied from the DefaultFileListParser.
 My mistake was in assuming that that approach could be generalized everywhere.


-----Original Message-----
From:	Jeffrey D. Brekke [mailto:jbrekke@wi.rr.com]
Sent:	Fri 2/28/2003 10:49 PM
To:	commons-dev@jakarta.apache.org
Cc:	
Subject:	[NET] ftp entry spans lines


Steve,

We came across the problem with OpenVMS's listings.  Monday I'll get a
sample listing with an entry that spans two lines and add it to the
VMSFTPEntryParserTest.

Our idea was to let the regular expression handle parsing the entries
and not rely on the line endings.

I believe this is similar to how the current parsing works.  The
problem is you need to either pass around the entire list unparsed,
and parse it when needed, or parse the entire list up front, creating
all the FTPFile entries right away.  I believe you wrote the new
parsers to avoid this up front object creation and parsing assuming
one entry per line.

"Steve Cohen" <SCohen@sportvision.com> writes:
> No, I haven't solved that one.  It's been around for a while.  The
> line-based parsing has been standard in this package since it began.
> 
> I think we could, in principle, solve it by making each parser
> define a method that returns the entry delimiter.  Instead of
> calling readLine() we would have it read from the stream until the
> delimiter was found.  That delineates an entry.
> 
> Do we have the specs on which systems have multi-line entries and
> what the entry delimiter on these systems is?  I'd like to have a
> feel for what the extent of the variety is in the real world before
> coding an interface.
> 
> In fact, it would probably be a very good idea to put together a
> list all the ftp systems this project knows it wants to support and
> their parameters before getting too deep into coding.
> 
> 
> -----Original Message-----
> From:	Jeffrey D. Brekke [mailto:jbrekke@wi.rr.com]
> Sent:	Fri 2/28/2003 9:54 PM
> To:	Steve Cohen
> Cc:	
> Subject:	Re: FW: Problems using maven site:deploy
> 
> Sounds great, you can answer the next question on the commons-dev
> list maybe.  Did you find any solutions for ftp entries that span
> more than one line?

-- 
=====================================================================
Jeffrey D. Brekke                                   jbrekke@wi.rr.com
Wisconsin,  USA                                     brekke@apache.org
                                                    ekkerbj@yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message