lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vijay Veeraraghavan <vijay.raghava...@gmail.com>
Subject Re: Using IndexReader in the web environment
Date Wed, 05 May 2010 04:17:48 GMT
hey Ian,
thanks for the reply. I find it very useful. My report generating
scheduler will run periodically, once done it will invoke the indexer
and exit. In this case I do not know if the index has changed or not.
How do i keep track of the changes in the index? As the two entities,
scheduler/indexer and the web application, are totally different.

Vijay

On 5/4/10, Ian Lea <ian.lea@gmail.com> wrote:
> For best performance you should aim to keep a shared index searcher,
> or the underlying index reader, open as long as possible.  You may of
> course need to reopen it if/when the index changes.  As to scope, you
> can store it wherever it makes sense for your application.
>
>
> --
> Ian.
>
>
> On Tue, May 4, 2010 at 10:13 AM, Vijay Veeraraghavan
> <vijay.raghavan08@gmail.com> wrote:
>> Hi,
>> Thanks for the reply. So I will have a dedicated servlet to search the
>> index, but does it mean that the indexsearcher does not close the
>> index, keep it open? Is it not possible to keep it in the application
>> scope?
>>
>> Vijay
>>
>> On 5/3/10, Vijay Veeraraghavan <vijay.raghavan08@gmail.com> wrote:
>>> Hi all,
>>>
>>> In a clustered environment I search the index from the web
>>> application. In the web application I am creating IndexReader on each
>>> request. is it expensive to do like this? I read somewhere in the web
>>> that try using the same reader as much as possible. Can i keep the
>>> initially created IndexReader in the session/application scopes and
>>> use the same for each request? Any other idea?
>>>
>>> Viay
>>>
>>> On 5/3/10, Vijay Veeraraghavan <vijay.raghavan08@gmail.com> wrote:
>>>> dear all,
>>>>
>>>> as replied below, does searching again for the document in the index
>>>> and if found skip the indexing else index it, is this not similar to
>>>> indexing all pdf documents once again, is not this overhead? As I am
>>>> not going to index the details of the pdf (so if an indexed pdf was
>>>> recreated i need not reindex it) but just the paths of the documents.
>>>>
>>>> Vijay
>>>>
>>>>>> Hey there,
>>>>>>
>>>>>> you might have to implement a some kind of unique identifier using
an
>>>>>> indexed lucene field. When you are indexing you should fire a query
>>>>>> with
>>>>>> the
>>>>>> uuid of your document (maybe the path to you pdf document) and check
>>>>>> if
>>>>>> the
>>>>>> document is in the index already. You could also do a boolean query
>>>>>> combining UUID, timestamp and / or a hash value to see if the document
>>>>>> has
>>>>>> been changed. if so you can simply update the document by its UUID
>>>>>> (something like indexwriter.updateDocument(new Term("uuid",
>>>>>> value),document);)
>>>>>>
>>>>>> Unfortunately you have to implement this yourself but it should not
be
>>>>>> that
>>>>>> much of a deal.
>>>>>>
>>>>>> simon
>>>>>>
>>>>>> On Mon, May 3, 2010 at 9:21 AM, Vijay Veeraraghavan <
>>>>>> vijay.raghavan08@gmail.com> wrote:
>>>>>>
>>>>>>> Dear all,
>>>>>>> I am using lucene 3.0 to index the pdf reports that I generate
>>>>>>> dynamically. I index the pdf file name (without extension), file
path
>>>>>>> and its absolute path as fields. I search with the file name
without
>>>>>>> extension; it retrieves a list, as usually 2 or more files are
>>>>>>> present
>>>>>>> in the same name in different sub directories. As I create the
index
>>>>>>> for the first time it updates, assuming 100 pdf files in different
>>>>>>> directories, the files meta info. If again I do indexing, while
my
>>>>>>> report generator scheduler has the produced 500 more pdf files
>>>>>>> totaling to 600 files in different directories, I wish to index
only
>>>>>>> the new files to the index. But presently it’s doing the whole
thing
>>>>>>> again (600 files). How to implement this functionality? Think
of the
>>>>>>> thousands of pdf files created on each run.
>>>>>>>
>>>>>>> P.S: I cannot keep the meta-info of generated pdf files in the
java
>>>>>>> memory, as it exceeds thousands in a single run, and update the
index
>>>>>>> looping this list.
>>>>>>>
>>>>>>> new IndexWriter(FSDirectory.open(this.indexDir), new
>>>>>>> StandardAnalyzer(
>>>>>>>                                        Version.LUCENE_CURRENT),
true,
>>>>>>>
>>>>>>> IndexWriter.MaxFieldLength.LIMITED);
>>>>>>>
>>>>>>> is the boolean parameter is for this purpose? Please guide me.
>>>>>>>
>>>>>>> --
>>>>>>> Thanks
>>>>>>> Vijay Veeraraghavan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Thanks & Regards
>>>>>>> Vijay Veeraraghavan
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks & Regards
>>>>> Vijay Veeraraghavan
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks & Regards
>>>> Vijay Veeraraghavan
>>>>
>>>
>>>
>>> --
>>> Thanks & Regards
>>> Vijay Veeraraghavan
>>>
>>
>>
>> --
>> Thanks & Regards
>> Vijay Veeraraghavan
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Thanks & Regards
Vijay Veeraraghavan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message