lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vincent Daron <trop...@tiscali.be>
Subject Re: Limitations of Lucene with large documents
Date Mon, 31 Jul 2006 09:28:41 GMT
Philip Withington a écrit :
> Hello All
>
> I am looking for some information on the limitations of Lucene.Net.  I 
> have
> been investigating the viability of using lucene as search engine on a
> collection of large documents.  There are about 30,000 documents and they
> can be anything up to 5MB in size (plain text).  Although no errors 
> occurred
> while indexing the documents, the generated index did not appear to be
> searchable.  I then used the Luke tool to see if I could find out why and
> although the index seemed to be browseable, when trying to search I 
> got Java
> heap exceptions.
You're index is probably corrupted, are you sure that you're closing the 
IndexWriter cleanly ?

Another problem may come if you open the index multiple times, check the 
lock directory used (You have to specify the lock directory in the 
configuration file using "<add key="Lucene.Net.lockDir" 
value="c:\temp"/>" in the appSettings if your accessing your index from 
different users (often with webapp)
>
> I guess the limits are going to depend a lot on the hardware being 
> used to
> host the index but does anyone have any experience or tips on getting 
> Lucene
> to work with large documents?  Also, is there any documentation on 
> sensible
> limits to what can be achieved be with Lucene or any rules of thumb as to
> what you can and can't do?
>
I'm using lucene with millions of documents and an index of multiple 
giga bytes without any problems.

Be care about the "maxFieldLength" property if you're indexing large 
documents, it indicates the maximum terms indexed for a document, it's 
10.000 by default, set maybe this property to an higher value (look here 
for more info 
http://www.dotlucene.net/documentation/api/1.9/Lucene.Net.Index.IndexWriter.maxFieldLength.html).
> Thanks
>
> Phil
>
See you

Vincent

Mime
View raw message