Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 63392 invoked from network); 5 May 2010 08:51:04 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 5 May 2010 08:51:04 -0000 Received: (qmail 51538 invoked by uid 500); 5 May 2010 08:51:02 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 51127 invoked by uid 500); 5 May 2010 08:50:59 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 51114 invoked by uid 99); 5 May 2010 08:50:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 May 2010 08:50:58 +0000 X-ASF-Spam-Status: No, hits=-0.8 required=10.0 tests=AWL,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ian.lea@gmail.com designates 209.85.218.211 as permitted sender) Received: from [209.85.218.211] (HELO mail-bw0-f211.google.com) (209.85.218.211) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 May 2010 08:50:52 +0000 Received: by bwz3 with SMTP id 3so2561700bwz.11 for ; Wed, 05 May 2010 01:50:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=H/3LqPU0KaOD2Thzm33q0yrq1jIj1pkQgM5hHHYwiIk=; b=PjGn9dK/VpdYepfzl84JefskqZdriiV3H3u/Hu4zzVRsWjo4VphBlSWaFOQ7OLzzRN PLOpW3cmw4zphZbto4PZ2YrkjZnx1SUbdOTPVvzt4JtvgQm0zDDV0eNmWO9PZx7OHz2l T20y6mCTA+nZii3XtdBADg5MG5HPoy2vK9ruc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=SjYi4t5t4bmAjHqSOY9MXGIt9fFAy45dGRz7HuqseKiV54Enf2m1lOvtMPfgJXEarK bA4pwf2Ckx6Yrtgoz83V2zmsvquy6DAiIBqvUKG6//wJP6PGpVKSBb8C747kWon0o9oi /kNVzY/YoWU7dD29jbd/pEHoZCGR6x84t0Dlo= Received: by 10.204.10.15 with SMTP id n15mr4833685bkn.158.1273049428376; Wed, 05 May 2010 01:50:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.54.198 with HTTP; Wed, 5 May 2010 01:50:08 -0700 (PDT) In-Reply-To: References: From: Ian Lea Date: Wed, 5 May 2010 09:50:08 +0100 Message-ID: Subject: Re: Using IndexReader in the web environment To: java-user@lucene.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable You could tell the searching part of your app, via some notification or messaging call. Or call IndexReader.isCurrent() from time to time, or even on every search, and reopen() if necessary. See the javadocs and don't forget to close the old reader when you do call reopen. -- Ian. On Wed, May 5, 2010 at 5:17 AM, Vijay Veeraraghavan wrote: > hey Ian, > thanks for the reply. I find it very useful. My report generating > scheduler will run periodically, once done it will invoke the indexer > and exit. In this case I do not know if the index has changed or not. > How do i keep track of the changes in the index? As the two entities, > scheduler/indexer and the web application, are totally different. > > Vijay > > On 5/4/10, Ian Lea wrote: >> For best performance you should aim to keep a shared index searcher, >> or the underlying index reader, open as long as possible. =A0You may of >> course need to reopen it if/when the index changes. =A0As to scope, you >> can store it wherever it makes sense for your application. >> >> >> -- >> Ian. >> >> >> On Tue, May 4, 2010 at 10:13 AM, Vijay Veeraraghavan >> wrote: >>> Hi, >>> Thanks for the reply. So I will have a dedicated servlet to search the >>> index, but does it mean that the indexsearcher does not close the >>> index, keep it open? Is it not possible to keep it in the application >>> scope? >>> >>> Vijay >>> >>> On 5/3/10, Vijay Veeraraghavan wrote: >>>> Hi all, >>>> >>>> In a clustered environment I search the index from the web >>>> application. In the web application I am creating IndexReader on each >>>> request. is it expensive to do like this? I read somewhere in the web >>>> that try using the same reader as much as possible. Can i keep the >>>> initially created IndexReader in the session/application scopes and >>>> use the same for each request? Any other idea? >>>> >>>> Viay >>>> >>>> On 5/3/10, Vijay Veeraraghavan wrote: >>>>> dear all, >>>>> >>>>> as replied below, does searching again for the document in the index >>>>> and if found skip the indexing else index it, is this not similar to >>>>> indexing all pdf documents once again, is not this overhead? As I am >>>>> not going to index the details of the pdf (so if an indexed pdf was >>>>> recreated i need not reindex it) but just the paths of the documents. >>>>> >>>>> Vijay >>>>> >>>>>>> Hey there, >>>>>>> >>>>>>> you might have to implement a some kind of unique identifier using = an >>>>>>> indexed lucene field. When you are indexing you should fire a query >>>>>>> with >>>>>>> the >>>>>>> uuid of your document (maybe the path to you pdf document) and chec= k >>>>>>> if >>>>>>> the >>>>>>> document is in the index already. You could also do a boolean query >>>>>>> combining UUID, timestamp and / or a hash value to see if the docum= ent >>>>>>> has >>>>>>> been changed. if so you can simply update the document by its UUID >>>>>>> (something like indexwriter.updateDocument(new Term("uuid", >>>>>>> value),document);) >>>>>>> >>>>>>> Unfortunately you have to implement this yourself but it should not= be >>>>>>> that >>>>>>> much of a deal. >>>>>>> >>>>>>> simon >>>>>>> >>>>>>> On Mon, May 3, 2010 at 9:21 AM, Vijay Veeraraghavan < >>>>>>> vijay.raghavan08@gmail.com> wrote: >>>>>>> >>>>>>>> Dear all, >>>>>>>> I am using lucene 3.0 to index the pdf reports that I generate >>>>>>>> dynamically. I index the pdf file name (without extension), file p= ath >>>>>>>> and its absolute path as fields. I search with the file name witho= ut >>>>>>>> extension; it retrieves a list, as usually 2 or more files are >>>>>>>> present >>>>>>>> in the same name in different sub directories. As I create the ind= ex >>>>>>>> for the first time it updates, assuming 100 pdf files in different >>>>>>>> directories, the files meta info. If again I do indexing, while my >>>>>>>> report generator scheduler has the produced 500 more pdf files >>>>>>>> totaling to 600 files in different directories, I wish to index on= ly >>>>>>>> the new files to the index. But presently it=92s doing the whole t= hing >>>>>>>> again (600 files). How to implement this functionality? Think of t= he >>>>>>>> thousands of pdf files created on each run. >>>>>>>> >>>>>>>> P.S: I cannot keep the meta-info of generated pdf files in the jav= a >>>>>>>> memory, as it exceeds thousands in a single run, and update the in= dex >>>>>>>> looping this list. >>>>>>>> >>>>>>>> new IndexWriter(FSDirectory.open(this.indexDir), new >>>>>>>> StandardAnalyzer( >>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0Version.LUCENE_CURRENT), true, >>>>>>>> >>>>>>>> IndexWriter.MaxFieldLength.LIMITED); >>>>>>>> >>>>>>>> is the boolean parameter is for this purpose? Please guide me. >>>>>>>> >>>>>>>> -- >>>>>>>> Thanks >>>>>>>> Vijay Veeraraghavan >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Thanks & Regards >>>>>>>> Vijay Veeraraghavan >>>>>>>> >>>>>>>> ------------------------------------------------------------------= --- >>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Thanks & Regards >>>>>> Vijay Veeraraghavan >>>>>> >>>>> >>>>> >>>>> -- >>>>> Thanks & Regards >>>>> Vijay Veeraraghavan >>>>> >>>> >>>> >>>> -- >>>> Thanks & Regards >>>> Vijay Veeraraghavan >>>> >>> >>> >>> -- >>> Thanks & Regards >>> Vijay Veeraraghavan >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > > > -- > Thanks & Regards > Vijay Veeraraghavan > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org