lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: subclassing of IndexReader
Date Wed, 19 Nov 2003 17:48:52 GMT
Christoph Goller wrote:
> Otis Gospodnetic schrieb:
> 
>> I am also involved in a small project that deals with crawling. :)
>> I have not done this, yet, but have thought about the same problem that
>> you are asking about - detecting small changes in web pages.
>> Have you considered using Nilsimsa?
>>
>> Otis
> 
> 
> Hi Otis,
> 
> sorry for the delay. Due to some "management" decisions the subject of
> dublicate checking no longer has top priority for me. But it will probably
> be of interest again next year.I did not try Nilsimsa so far. Did you?

Nilsimsa, even though on the surface it appears to work reasonably well, 
has been heavily criticized for weak theoretical foundations. See the 
archives of Nilsimsa mailing list for details.

I have yet to find an open source alternative to it, though ...

-- 
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message