commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brad Neuberg <bradneub...@yahoo.com>
Subject Re: [feedparser] Attaching patch again
Date Sun, 05 Sep 2004 19:41:56 GMT
That regular expression is meant to catch things like
the following:

http://www.somehost.com/blog/blosxom.cgi

We want to strip off the trailing filename so that we
just get the path.

I added the test for the double slashes because I was
finding that the expression was matching the
following, which it shouldn't:

http://somehost.com

If someone puts a double slash for the first example:

http://somehost.com/blog//blosxom.cgi

Then it would prevent this from matching, which is a
bug; thanx for finding that.

I think I need to change the regex to not match if the
section in parentheses is preceded by a colon and two
slashes.  What do you think?

Hope you're having a good weekend,
  Brad

--- "Kevin A. Burton" <burton@newsmonster.org> wrote:

> Brad Neuberg wrote:
> 
> > +    private static Pattern patternToStrip = 
> > Pattern.compile("[^/](/\\w*\\.\\w*$)");
> >
> Brad...
> 
> Won't the above regexp prevent:
> 
> http://foo.com//bar
> 
> from matching?
> 
> Usually the HTTP server will just do a 302 redir
> when using two slashes.
> 
> >      /**
> >       * A regex to extract the user from a Xanga
> URL
> > @@ -143,7 +143,8 @@
> >                new FeedReference("index.rss", 
> > FeedReference.RSS_MEDIA_TYPE),
> >                new FeedReference("rss.xml", 
> > FeedReference.RSS_MEDIA_TYPE),
> >                new FeedReference("index.rdf", 
> > FeedReference.RSS_MEDIA_TYPE),
> > -              new FeedReference("index.xml", 
> > FeedReference.XML_MEDIA_TYPE) };
> > +              new FeedReference("index.xml", 
> > FeedReference.RSS_MEDIA_TYPE),
> 
> Wondering if we should have a new media type...
> POTENTIAL_RSS_MEDIA_TYPE 
> ... this way we can just note that this MIGHT be a
> feed.
> 
> Other than that looks good. 
> 
> -- 
> 
> Please reply using PGP.
> 
>     http://peerfear.org/pubkey.asc    
>     
>     NewsMonster - http://www.newsmonster.org/
>     
> Kevin A. Burton, Location - San Francisco, CA, Cell
> - 415.595.9965
>        AIM/YIM - sfburtonator,  Web -
> http://peerfear.org/
> GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D
> 8D04 99F1 4412
>   IRC - freenode.net #infoanarchy | #p2p-hackers |
> #newsmonster
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> commons-dev-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message