Return-Path: Delivered-To: apmail-jakarta-commons-dev-archive@apache.org Received: (qmail 62816 invoked from network); 2 Mar 2003 03:43:08 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 2 Mar 2003 03:43:08 -0000 Received: (qmail 24268 invoked by uid 97); 2 Mar 2003 03:44:54 -0000 Delivered-To: qmlist-jakarta-archive-commons-dev@nagoya.betaversion.org Received: (qmail 24261 invoked from network); 2 Mar 2003 03:44:54 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 2 Mar 2003 03:44:54 -0000 Received: (qmail 62575 invoked by uid 500); 2 Mar 2003 03:43:05 -0000 Mailing-List: contact commons-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Jakarta Commons Developers List" Reply-To: "Jakarta Commons Developers List" Delivered-To: mailing list commons-dev@jakarta.apache.org Received: (qmail 62562 invoked from network); 2 Mar 2003 03:43:05 -0000 Received: from unknown (HELO ignitemedia.com) (64.157.167.108) by daedalus.apache.org with SMTP; 2 Mar 2003 03:43:05 -0000 X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Subject: RE: [NET] ftp entry spans lines Date: Sat, 1 Mar 2003 21:43:42 -0600 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [NET] ftp entry spans lines Thread-Index: AcLgWMMUbM8SnhsySdyCJJ5++mvQ+AAFEB9B From: "Steve Cohen" To: "Jakarta Commons Developers List" , "Jakarta Commons Developers List" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N I guess that can work. Perhaps the "simple" regex for could merely look = for an expression in parentheses to delimit entries, assuming that every = entry really does end that way. Even the closing parenthesis might be = enough to key off. But we'd need to find some real-world sites or a = good spec for the OpenVMS FTP to know for sure. There are many ways to skin this cat. Even in the worst case, the VMS = parser could override all the methods and do its own thing. For that = matter, there isn't even anything that FORCES each parser to use regular = expressions. But I don't think it will come to that. -----Original Message----- From: Jeffrey D. Brekke [mailto:jbrekke@wi.rr.com] Sent: Sat 3/1/2003 7:11 PM To: Jakarta Commons Developers List Cc:=09 Subject: Re: [NET] ftp entry spans lines "Steve Cohen" writes: > If you are going to include the entry ending in the same regular > expression that you use to break the entries down you either have to >=20 > 1) parse the whole thing as you are breaking apart the entries, > essentially doing away with the idea of not creating the FTPFile > entries right away=20 > > OR=20 > > 2) parse it twice, on the first pass not attempting to break it down > further than the entry delimiter, and then parsing it fully on the > second pass. >=20 > Since the regular expression is undoubtedly much more complicated > than the first pass requires, the problem with 2) is that it might > be very inefficient. >=20 > I see two alternatives here:=20 > > 3) Make each parser have TWO regular expressions, a simpler one for > delimiting entries and the current one for breaking them down into > FTP Files. >=20 > 4) Use a non-regular expression method for delimiting entries > probably on the java.io level, similar to what readLine() does. >=20 > However neither of these would work in the VMS case you describe if > there is no delimiter that can be looked ahead for, that is, if the > delimiting is done solely by context using the "algorithm" that "the > next entry starts when the last entry is finished". Is that the > case or is there some sort of delimiter? Yea this is more like what we saw. The entries that spanned two lines were done just for presenting the list nicely ( not sure if this is exact, but should do for an example. I'll get an excat sample on Monday at work ): 1-JUN.LIS;1 9/9 2-jun-1998 07:32:04 = [GROUP,OWNER] (RWED,RWED,RWED,RE)=20 1-JUN.LIS;2 9/9 JUN-2-1998 07:32:04 = [GROUP,OWNER] (RWED,RWED,RWED,)=20 1-JUN.LIS;2 a/9 2-JUN-98 07:32:04 = [GROUP,OWNER] (RWED,RWED,RWED,)=20 REALLYLONGFILENAMETHATMESSEDUPTHECOLUMNS;1 1/9 2-JUN-1998 07:32:04 = [GROUP,OWNER] (,RWED,RWED,RE)=20 So the line terminators don't mean much, but the RE matching can determine the number of entries correctly. Maybe something like #3 above would work to iterate through the entries somehow with the RE. Not sure its been a few months since I've thought much about this problem. =20 > The readLine() method of delimiting entries was something I copied > from the DefaultFileListParser. My mistake was in assuming that > that approach could be generalized everywhere. The old parsers work since the entire list is parsed up front. Our parser uses the RE to determine the entries. That is what we are using in production now. --=20 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Jeffrey D. Brekke jbrekke@wi.rr.com Wisconsin, USA brekke@apache.org ekkerbj@yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: commons-dev-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: commons-dev-help@jakarta.apache.org