Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of ian.lea@gmail.com designates
 209.85.218.220 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type:content-transfer-encoding;
        b=krRTwqmklODKkSlYjRRLeglOni0v6gxLMe3PBdYbPToEnf/qhT708ns4VgnA/QPA7B
         cm6R6o79YeUpuQYhXjp1Q3SrQuo2O9o0yUzkJuzpRsXmRVuC8vvPwVudc6isSsmNV3JD
         fLAfGQx9wXHqYC69H5cKTQyChPcvbexyZwu6c=
MIME-Version: 1.0
In-Reply-To: <p2mbd236a1005040213k7e5a4035w387c2ad906d54129@mail.gmail.com>
References: <q2gbd236a1005030321gd9f4506fy7b5136a05afb9263@mail.gmail.com>
	<p2mbd236a1005040213k7e5a4035w387c2ad906d54129@mail.gmail.com>
From: Ian Lea <ian.lea@gmail.com>
Date: Tue, 4 May 2010 14:31:49 +0100
Message-ID: <m2m8c4e68611005040631qbd7d0693v2787f50bc915d0c3@mail.gmail.com>
Subject: Re: Using IndexReader in the web environment
To: java-user@lucene.apache.org
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

For best performance you should aim to keep a shared index searcher,
or the underlying index reader, open as long as possible.  You may of
course need to reopen it if/when the index changes.  As to scope, you
can store it wherever it makes sense for your application.


--
Ian.


On Tue, May 4, 2010 at 10:13 AM, Vijay Veeraraghavan
<vijay.raghavan08@gmail.com> wrote:
> Hi,
> Thanks for the reply. So I will have a dedicated servlet to search the
> index, but does it mean that the indexsearcher does not close the
> index, keep it open? Is it not possible to keep it in the application
> scope?
>
> Vijay
>
> On 5/3/10, Vijay Veeraraghavan <vijay.raghavan08@gmail.com> wrote:
>> Hi all,
>>
>> In a clustered environment I search the index from the web
>> application. In the web application I am creating IndexReader on each
>> request. is it expensive to do like this? I read somewhere in the web
>> that try using the same reader as much as possible. Can i keep the
>> initially created IndexReader in the session/application scopes and
>> use the same for each request? Any other idea?
>>
>> Viay
>>
>> On 5/3/10, Vijay Veeraraghavan <vijay.raghavan08@gmail.com> wrote:
>>> dear all,
>>>
>>> as replied below, does searching again for the document in the index
>>> and if found skip the indexing else index it, is this not similar to
>>> indexing all pdf documents once again, is not this overhead? As I am
>>> not going to index the details of the pdf (so if an indexed pdf was
>>> recreated i need not reindex it) but just the paths of the documents.
>>>
>>> Vijay
>>>
>>>>> Hey there,
>>>>>
>>>>> you might have to implement a some kind of unique identifier using an
>>>>> indexed lucene field. When you are indexing you should fire a query
>>>>> with
>>>>> the
>>>>> uuid of your document (maybe the path to you pdf document) and check =
if
>>>>> the
>>>>> document is in the index already. You could also do a boolean query
>>>>> combining UUID, timestamp and / or a hash value to see if the documen=
t
>>>>> has
>>>>> been changed. if so you can simply update the document by its UUID
>>>>> (something like indexwriter.updateDocument(new Term("uuid",
>>>>> value),document);)
>>>>>
>>>>> Unfortunately you have to implement this yourself but it should not b=
e
>>>>> that
>>>>> much of a deal.
>>>>>
>>>>> simon
>>>>>
>>>>> On Mon, May 3, 2010 at 9:21 AM, Vijay Veeraraghavan <
>>>>> vijay.raghavan08@gmail.com> wrote:
>>>>>
>>>>>> Dear all,
>>>>>> I am using lucene 3.0 to index the pdf reports that I generate
>>>>>> dynamically. I index the pdf file name (without extension), file pat=
h
>>>>>> and its absolute path as fields. I search with the file name without
>>>>>> extension; it retrieves a list, as usually 2 or more files are prese=
nt
>>>>>> in the same name in different sub directories. As I create the index
>>>>>> for the first time it updates, assuming 100 pdf files in different
>>>>>> directories, the files meta info. If again I do indexing, while my
>>>>>> report generator scheduler has the produced 500 more pdf files
>>>>>> totaling to 600 files in different directories, I wish to index only
>>>>>> the new files to the index. But presently it=92s doing the whole thi=
ng
>>>>>> again (600 files). How to implement this functionality? Think of the
>>>>>> thousands of pdf files created on each run.
>>>>>>
>>>>>> P.S: I cannot keep the meta-info of generated pdf files in the java
>>>>>> memory, as it exceeds thousands in a single run, and update the inde=
x
>>>>>> looping this list.
>>>>>>
>>>>>> new IndexWriter(FSDirectory.open(this.indexDir), new StandardAnalyze=
r(
>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0Version.LUCENE_CURRENT), true,
>>>>>>
>>>>>> IndexWriter.MaxFieldLength.LIMITED);
>>>>>>
>>>>>> is the boolean parameter is for this purpose? Please guide me.
>>>>>>
>>>>>> --
>>>>>> Thanks
>>>>>> Vijay Veeraraghavan
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks & Regards
>>>>>> Vijay Veeraraghavan
>>>>>>
>>>>>> --------------------------------------------------------------------=
-
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks & Regards
>>>> Vijay Veeraraghavan
>>>>
>>>
>>>
>>> --
>>> Thanks & Regards
>>> Vijay Veeraraghavan
>>>
>>
>>
>> --
>> Thanks & Regards
>> Vijay Veeraraghavan
>>
>
>
> --
> Thanks & Regards
> Vijay Veeraraghavan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org