lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Fraschetti <frasche...@gmail.com>
Subject URL Stemmer
Date Wed, 27 Jul 2005 23:16:07 GMT
Writing simple code to trim down a URL is trivial, but to actually
trim it down to its most meaningful state is very hard. In same cases
the URL parameters actually define the page in others they are useless
babble. I'd like to use the hash of a page's URL as well as a hash of
the content data to help me eliminate duplicates... is there any good
methods that are commonly used for URL stemming?

-- 
___________________________________________________
Chris Fraschetti
e fraschetti@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message