Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 759D4982C for ; Tue, 6 Dec 2011 06:11:50 +0000 (UTC) Received: (qmail 12762 invoked by uid 500); 6 Dec 2011 06:11:48 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 12324 invoked by uid 500); 6 Dec 2011 06:11:47 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 12314 invoked by uid 99); 6 Dec 2011 06:11:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Dec 2011 06:11:46 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of nskarthik.k@gmail.com designates 209.85.213.176 as permitted sender) Received: from [209.85.213.176] (HELO mail-yx0-f176.google.com) (209.85.213.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Dec 2011 06:11:40 +0000 Received: by yenm10 with SMTP id m10so4748925yen.35 for ; Mon, 05 Dec 2011 22:11:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=F8KngIvpg5+gjnyuxpKHNv7FqCCXUNOhMV/xHh/IFUQ=; b=tMRJyTakN62roFyF5YF+FWo/aFAR5Z2vsiNcDyJ5m2Hu2k8CwfbcpG8ahryntCGQBf d5QW88RcEmgc+KYORXDOddAC1NcFk7+SLPmoVUTQXztnsSKb8hgge7pWopYObp+MD73P 3oPzBOBepikTz2I3aTHUvcLyN67PpXHopNnEw= MIME-Version: 1.0 Received: by 10.236.161.193 with SMTP id w41mr16829872yhk.93.1323151879381; Mon, 05 Dec 2011 22:11:19 -0800 (PST) Received: by 10.146.195.3 with HTTP; Mon, 5 Dec 2011 22:11:19 -0800 (PST) In-Reply-To: <161FD7D0-E01F-42F2-A02A-A4E4B182CA0D@ebi.ac.uk> References: <161FD7D0-E01F-42F2-A02A-A4E4B182CA0D@ebi.ac.uk> Date: Tue, 6 Dec 2011 11:41:19 +0530 Message-ID: Subject: Re: Use multiple lucene indices From: KARTHIK SHIVAKUMAR To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=20cf30563915c121e604b3664dff X-Virus-Checked: Checked by ClamAV on apache.org --20cf30563915c121e604b3664dff Content-Type: text/plain; charset=ISO-8859-1 hi >> would the memory usage go through the roof? Yup .... My past experience got me pickels in there... with regards karthik On Mon, Dec 5, 2011 at 11:28 PM, Rui Wang wrote: > Hi All, > > We are planning to use lucene in our project, but not entirely sure about > some of the design decisions were made. Below are the details, any > comments/suggestions are more than welcome. > > The requirements of the project are below: > > 1. We have tens of thousands of files, their size ranging from 500M to a > few terabytes, and majority of the contents in these files will not be > accessed frequently. > > 2. We are planning to keep less accessed contents outside of our database, > store them on the file system. > > 3. We also have code to get the binary position of these contents in the > files. Using these binary positions, we can quickly retrieve the contents > and convert them into our domain objects. > > We think Lucene provides a scalable solution for storing and indexing > these binary positions, so the idea is that each piece of the content in > the files will a document, each document will have at least an ID field to > identify to content and a binary position field contains the starting and > stop position of the content. Having done some performance testing, it > seems to us that Lucene is well capable of doing this. > > At the moment, we are planning to create one Lucene index per file, so if > we have new files to be added to the system, we can simply generate a new > index. The problem is do with searching, this approach means that we need > to create an new IndexSearcher every time a file is accessed through our > web service. We knew that it is rather expensive to open a new > IndexSearcher, and are thinking of using some kind of pooling mechanism. > Our questions are: > > 1. Is this one index per file approach a viable solution? What do you > think about pooling IndexSearcher? > > 2. If we have many IndexSearchers opened at the same time, would the > memory usage go through the roof? I couldn't find any document on how > Lucene use allocate memory. > > Thank you very much for your help. > > Many thanks, > Rui Wang > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > -- *N.S.KARTHIK R.M.S.COLONY BEHIND BANK OF INDIA R.M.V 2ND STAGE BANGALORE 560094* --20cf30563915c121e604b3664dff--