lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From KARTHIK SHIVAKUMAR <nskarthi...@gmail.com>
Subject Re: Use multiple lucene indices
Date Tue, 06 Dec 2011 06:11:19 GMT
hi

>> would the memory usage go through the roof?

Yup ....

My past experience got me pickels  in there...



with regards
karthik

On Mon, Dec 5, 2011 at 11:28 PM, Rui Wang <rwang@ebi.ac.uk> wrote:

> Hi All,
>
> We are planning to use lucene in our project, but not entirely sure about
> some of the design decisions were made. Below are the details, any
> comments/suggestions are more than welcome.
>
> The requirements of the project are below:
>
> 1. We have  tens of thousands of files, their size ranging from 500M to a
> few terabytes, and majority of the contents in these files will not be
> accessed frequently.
>
> 2. We are planning to keep less accessed contents outside of our database,
> store them on the file system.
>
> 3. We also have code to get the binary position of these contents in the
> files. Using these binary positions, we can quickly retrieve the contents
> and convert them into our domain objects.
>
> We think Lucene provides a scalable solution for storing and indexing
> these binary positions, so the idea is that each piece of the content in
> the files will a document, each document will have at least an ID field to
> identify to content and a binary position field contains the starting and
> stop position of the content. Having done some performance testing, it
> seems to us that Lucene is well capable of doing this.
>
> At the moment, we are planning to create one Lucene index per file, so if
> we have new files to be added to the system, we can simply generate a new
> index. The problem is do with searching, this approach means that we need
> to create an new IndexSearcher every time a file is accessed through our
> web service. We knew that it is rather expensive to open a new
> IndexSearcher, and are thinking of using some kind of pooling mechanism.
> Our questions are:
>
> 1. Is this one index per file approach a viable solution? What do you
> think about pooling IndexSearcher?
>
> 2. If we have many IndexSearchers opened at the same time, would the
> memory usage go through the roof? I couldn't find any document on how
> Lucene use allocate memory.
>
> Thank you very much for your help.
>
> Many thanks,
> Rui Wang
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message