lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rui Wang <rw...@ebi.ac.uk>
Subject Use multiple lucene indices
Date Mon, 05 Dec 2011 17:58:29 GMT
Hi All, 

We are planning to use lucene in our project, but not entirely sure about some of the design
decisions were made. Below are the details, any comments/suggestions are more than welcome.


The requirements of the project are below:

1. We have  tens of thousands of files, their size ranging from 500M to a few terabytes, and
majority of the contents in these files will not be accessed frequently. 

2. We are planning to keep less accessed contents outside of our database, store them on the
file system.

3. We also have code to get the binary position of these contents in the files. Using these
binary positions, we can quickly retrieve the contents and convert them into our domain objects.


We think Lucene provides a scalable solution for storing and indexing these binary positions,
so the idea is that each piece of the content in the files will a document, each document
will have at least an ID field to identify to content and a binary position field contains
the starting and stop position of the content. Having done some performance testing, it seems
to us that Lucene is well capable of doing this. 

At the moment, we are planning to create one Lucene index per file, so if we have new files
to be added to the system, we can simply generate a new index. The problem is do with searching,
this approach means that we need to create an new IndexSearcher every time a file is accessed
through our web service. We knew that it is rather expensive to open a new IndexSearcher,
and are thinking of using some kind of pooling mechanism. Our questions are:

1. Is this one index per file approach a viable solution? What do you think about pooling
IndexSearcher?

2. If we have many IndexSearchers opened at the same time, would the memory usage go through
the roof? I couldn't find any document on how Lucene use allocate memory. 

Thank you very much for your help. 

Many thanks,
Rui Wang
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message