lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From liugangc <>
Subject Re:Use multiple lucene indices
Date Tue, 06 Dec 2011 01:55:51 GMT
hi, below is some hints from my experience:
1. if you use one index per file, and many indexsearcher open at the same time, you may meet
'too many open files' error. you have to increase file_max value of os. 
2. if  these index files have less concurrent access, i think it's reasonable that open new
searcher for every access. meanwhile, if you use lucene sort feature, field cache may consume
many memory. thus  too many opened indexsearcher at the same time could exhaust all memory
of your machine.

gang liu

At 2011-12-06 01:58:29,"Rui Wang" <> wrote:
>Hi All, 
>We are planning to use lucene in our project, but not entirely sure about some of the
design decisions were made. Below are the details, any comments/suggestions are more than
>The requirements of the project are below:
>1. We have  tens of thousands of files, their size ranging from 500M to a few terabytes,
and majority of the contents in these files will not be accessed frequently. 
>2. We are planning to keep less accessed contents outside of our database, store them
on the file system.
>3. We also have code to get the binary position of these contents in the files. Using
these binary positions, we can quickly retrieve the contents and convert them into our domain
>We think Lucene provides a scalable solution for storing and indexing these binary positions,
so the idea is that each piece of the content in the files will a document, each document
will have at least an ID field to identify to content and a binary position field contains
the starting and stop position of the content. Having done some performance testing, it seems
to us that Lucene is well capable of doing this. 
>At the moment, we are planning to create one Lucene index per file, so if we have new
files to be added to the system, we can simply generate a new index. The problem is do with
searching, this approach means that we need to create an new IndexSearcher every time a file
is accessed through our web service. We knew that it is rather expensive to open a new IndexSearcher,
and are thinking of using some kind of pooling mechanism. Our questions are:
>1. Is this one index per file approach a viable solution? What do you think about pooling
>2. If we have many IndexSearchers opened at the same time, would the memory usage go through
the roof? I couldn't find any document on how Lucene use allocate memory. 
>Thank you very much for your help. 
>Many thanks,
>Rui Wang
>To unsubscribe, e-mail:
>For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message