lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leo Galambos <galam...@com-os2.ms.mff.cuni.cz>
Subject HTML saga continues...
Date Thu, 12 Dec 2002 19:12:53 GMT
So, I have tried this with Lucene:
1) original JavaCC LL(k) HTML parser
2) SWING's HTML parser

In case of (1) I could process about 300K of HTML documents. In case of 
(2) more than 400K.

But I cannot process complete collection (5M) and finish my hard stress
tests of Lucene.

Is there anyone who has HTML parser that really works with Lucene? :) If
you think that you have one, please let me know. I wanted to try Neko, but 
it looks complicated and I do not want to affect the results by ``robust'' 
parser.

THX

-g-


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message