Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 15842 invoked from network); 4 May 2010 13:32:39 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 May 2010 13:32:39 -0000 Received: (qmail 69909 invoked by uid 500); 4 May 2010 13:32:37 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 69874 invoked by uid 500); 4 May 2010 13:32:37 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 69866 invoked by uid 99); 4 May 2010 13:32:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 May 2010 13:32:37 +0000 X-ASF-Spam-Status: No, hits=-0.9 required=10.0 tests=AWL,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ian.lea@gmail.com designates 209.85.218.220 as permitted sender) Received: from [209.85.218.220] (HELO mail-bw0-f220.google.com) (209.85.218.220) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 May 2010 13:32:31 +0000 Received: by bwz20 with SMTP id 20so1920920bwz.12 for ; Tue, 04 May 2010 06:32:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=AXTk2YNKJ9xube0ZM+KkmRouY/8QiPPdYzAZGCZ8W2s=; b=V0zD+XNmHz2bl6TO+q7Zq7PUOA8j/logTKE8Henbhxu8TOwNgOT7I/qo8n+Lj8trAD AQ+xE6K6JiwZ5+5timhKf4gQzphfAOZAI8ASejGAvSFFijQ6SicItltPVkwKr0hZ4ybQ FHdMUoUOiapv7FCOVA3m5i7z/NRXmrfB7y6Vc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=krRTwqmklODKkSlYjRRLeglOni0v6gxLMe3PBdYbPToEnf/qhT708ns4VgnA/QPA7B cm6R6o79YeUpuQYhXjp1Q3SrQuo2O9o0yUzkJuzpRsXmRVuC8vvPwVudc6isSsmNV3JD fLAfGQx9wXHqYC69H5cKTQyChPcvbexyZwu6c= Received: by 10.204.142.207 with SMTP id r15mr6240213bku.134.1272979929634; Tue, 04 May 2010 06:32:09 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.54.198 with HTTP; Tue, 4 May 2010 06:31:49 -0700 (PDT) In-Reply-To: References: From: Ian Lea Date: Tue, 4 May 2010 14:31:49 +0100 Message-ID: Subject: Re: Using IndexReader in the web environment To: java-user@lucene.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable For best performance you should aim to keep a shared index searcher, or the underlying index reader, open as long as possible. You may of course need to reopen it if/when the index changes. As to scope, you can store it wherever it makes sense for your application. -- Ian. On Tue, May 4, 2010 at 10:13 AM, Vijay Veeraraghavan wrote: > Hi, > Thanks for the reply. So I will have a dedicated servlet to search the > index, but does it mean that the indexsearcher does not close the > index, keep it open? Is it not possible to keep it in the application > scope? > > Vijay > > On 5/3/10, Vijay Veeraraghavan wrote: >> Hi all, >> >> In a clustered environment I search the index from the web >> application. In the web application I am creating IndexReader on each >> request. is it expensive to do like this? I read somewhere in the web >> that try using the same reader as much as possible. Can i keep the >> initially created IndexReader in the session/application scopes and >> use the same for each request? Any other idea? >> >> Viay >> >> On 5/3/10, Vijay Veeraraghavan wrote: >>> dear all, >>> >>> as replied below, does searching again for the document in the index >>> and if found skip the indexing else index it, is this not similar to >>> indexing all pdf documents once again, is not this overhead? As I am >>> not going to index the details of the pdf (so if an indexed pdf was >>> recreated i need not reindex it) but just the paths of the documents. >>> >>> Vijay >>> >>>>> Hey there, >>>>> >>>>> you might have to implement a some kind of unique identifier using an >>>>> indexed lucene field. When you are indexing you should fire a query >>>>> with >>>>> the >>>>> uuid of your document (maybe the path to you pdf document) and check = if >>>>> the >>>>> document is in the index already. You could also do a boolean query >>>>> combining UUID, timestamp and / or a hash value to see if the documen= t >>>>> has >>>>> been changed. if so you can simply update the document by its UUID >>>>> (something like indexwriter.updateDocument(new Term("uuid", >>>>> value),document);) >>>>> >>>>> Unfortunately you have to implement this yourself but it should not b= e >>>>> that >>>>> much of a deal. >>>>> >>>>> simon >>>>> >>>>> On Mon, May 3, 2010 at 9:21 AM, Vijay Veeraraghavan < >>>>> vijay.raghavan08@gmail.com> wrote: >>>>> >>>>>> Dear all, >>>>>> I am using lucene 3.0 to index the pdf reports that I generate >>>>>> dynamically. I index the pdf file name (without extension), file pat= h >>>>>> and its absolute path as fields. I search with the file name without >>>>>> extension; it retrieves a list, as usually 2 or more files are prese= nt >>>>>> in the same name in different sub directories. As I create the index >>>>>> for the first time it updates, assuming 100 pdf files in different >>>>>> directories, the files meta info. If again I do indexing, while my >>>>>> report generator scheduler has the produced 500 more pdf files >>>>>> totaling to 600 files in different directories, I wish to index only >>>>>> the new files to the index. But presently it=92s doing the whole thi= ng >>>>>> again (600 files). How to implement this functionality? Think of the >>>>>> thousands of pdf files created on each run. >>>>>> >>>>>> P.S: I cannot keep the meta-info of generated pdf files in the java >>>>>> memory, as it exceeds thousands in a single run, and update the inde= x >>>>>> looping this list. >>>>>> >>>>>> new IndexWriter(FSDirectory.open(this.indexDir), new StandardAnalyze= r( >>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0Version.LUCENE_CURRENT), true, >>>>>> >>>>>> IndexWriter.MaxFieldLength.LIMITED); >>>>>> >>>>>> is the boolean parameter is for this purpose? Please guide me. >>>>>> >>>>>> -- >>>>>> Thanks >>>>>> Vijay Veeraraghavan >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Thanks & Regards >>>>>> Vijay Veeraraghavan >>>>>> >>>>>> --------------------------------------------------------------------= - >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Thanks & Regards >>>> Vijay Veeraraghavan >>>> >>> >>> >>> -- >>> Thanks & Regards >>> Vijay Veeraraghavan >>> >> >> >> -- >> Thanks & Regards >> Vijay Veeraraghavan >> > > > -- > Thanks & Regards > Vijay Veeraraghavan > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org