Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of nskarthik.k@gmail.com
 designates 209.85.213.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <161FD7D0-E01F-42F2-A02A-A4E4B182CA0D@ebi.ac.uk>
References: <161FD7D0-E01F-42F2-A02A-A4E4B182CA0D@ebi.ac.uk>
Date: Tue, 6 Dec 2011 11:41:19 +0530
Message-ID: 
 <CAFVhWXieRFqstbGPi+wM1zhZLL0SMr0uz8+7CUhsHPYdUWQpQA@mail.gmail.com>
Subject: Re: Use multiple lucene indices
From: KARTHIK SHIVAKUMAR <nskarthik.k@gmail.com>
To: java-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=20cf30563915c121e604b3664dff

--20cf30563915c121e604b3664dff
Content-Type: text/plain; charset=ISO-8859-1

hi

>> would the memory usage go through the roof?

Yup ....

My past experience got me pickels  in there...


with regards
karthik

On Mon, Dec 5, 2011 at 11:28 PM, Rui Wang <rwang@ebi.ac.uk> wrote:

> Hi All,
>
> We are planning to use lucene in our project, but not entirely sure about
> some of the design decisions were made. Below are the details, any
> comments/suggestions are more than welcome.
>
> The requirements of the project are below:
>
> 1. We have  tens of thousands of files, their size ranging from 500M to a
> few terabytes, and majority of the contents in these files will not be
> accessed frequently.
>
> 2. We are planning to keep less accessed contents outside of our database,
> store them on the file system.
>
> 3. We also have code to get the binary position of these contents in the
> files. Using these binary positions, we can quickly retrieve the contents
> and convert them into our domain objects.
>
> We think Lucene provides a scalable solution for storing and indexing
> these binary positions, so the idea is that each piece of the content in
> the files will a document, each document will have at least an ID field to
> identify to content and a binary position field contains the starting and
> stop position of the content. Having done some performance testing, it
> seems to us that Lucene is well capable of doing this.
>
> At the moment, we are planning to create one Lucene index per file, so if
> we have new files to be added to the system, we can simply generate a new
> index. The problem is do with searching, this approach means that we need
> to create an new IndexSearcher every time a file is accessed through our
> web service. We knew that it is rather expensive to open a new
> IndexSearcher, and are thinking of using some kind of pooling mechanism.
> Our questions are:
>
> 1. Is this one index per file approach a viable solution? What do you
> think about pooling IndexSearcher?
>
> 2. If we have many IndexSearchers opened at the same time, would the
> memory usage go through the roof? I couldn't find any document on how
> Lucene use allocate memory.
>
> Thank you very much for your help.
>
> Many thanks,
> Rui Wang
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
*N.S.KARTHIK
R.M.S.COLONY
BEHIND BANK OF INDIA
R.M.V 2ND STAGE
BANGALORE
560094*

--20cf30563915c121e604b3664dff--