lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rui Wang <>
Subject Re: Use multiple lucene indices
Date Tue, 06 Dec 2011 09:03:23 GMT
Hi Guys,

Thank you very much for your answers. 

I will do some profiling on memory usage, but is there any documentation on how Lucene uses/allocates
the memory? 

Best wishes,
Rui Wang

On 6 Dec 2011, at 06:11, KARTHIK SHIVAKUMAR wrote:

> hi
>>> would the memory usage go through the roof?
> Yup ....
> My past experience got me pickels  in there...
> with regards
> karthik
> On Mon, Dec 5, 2011 at 11:28 PM, Rui Wang <> wrote:
>> Hi All,
>> We are planning to use lucene in our project, but not entirely sure about
>> some of the design decisions were made. Below are the details, any
>> comments/suggestions are more than welcome.
>> The requirements of the project are below:
>> 1. We have  tens of thousands of files, their size ranging from 500M to a
>> few terabytes, and majority of the contents in these files will not be
>> accessed frequently.
>> 2. We are planning to keep less accessed contents outside of our database,
>> store them on the file system.
>> 3. We also have code to get the binary position of these contents in the
>> files. Using these binary positions, we can quickly retrieve the contents
>> and convert them into our domain objects.
>> We think Lucene provides a scalable solution for storing and indexing
>> these binary positions, so the idea is that each piece of the content in
>> the files will a document, each document will have at least an ID field to
>> identify to content and a binary position field contains the starting and
>> stop position of the content. Having done some performance testing, it
>> seems to us that Lucene is well capable of doing this.
>> At the moment, we are planning to create one Lucene index per file, so if
>> we have new files to be added to the system, we can simply generate a new
>> index. The problem is do with searching, this approach means that we need
>> to create an new IndexSearcher every time a file is accessed through our
>> web service. We knew that it is rather expensive to open a new
>> IndexSearcher, and are thinking of using some kind of pooling mechanism.
>> Our questions are:
>> 1. Is this one index per file approach a viable solution? What do you
>> think about pooling IndexSearcher?
>> 2. If we have many IndexSearchers opened at the same time, would the
>> memory usage go through the roof? I couldn't find any document on how
>> Lucene use allocate memory.
>> Thank you very much for your help.
>> Many thanks,
>> Rui Wang
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> -- 
> 560094*

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message