lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Subhrajyoti Moitra" <subhrajyo...@contata.co.in>
Subject Re: differential indexing
Date Tue, 08 Apr 2003 04:08:02 GMT
thanks a lot Otis and Michael..

Otis i am doing what you had suggested below .. it seemes to solve my
problem..
i compare the dates at the application level rather than at the DB level..
since some pre-processing is being done.
now its working great.. thanks.. a lot..
subhro.

----- Original Message -----
From: "Otis Gospodnetic" <otis_gospodnetic@yahoo.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Monday, April 07, 2003 10:16 PM
Subject: Re: differential indexing


> Why not just have 'last_indexed_date' that stores the last time you
> fetched from DB and indexed the PDF docs.
> Then when you want to re-index only those documents that have changed
> you use this last_indexed_date in your SELECT: SELECT * FROM your_table
> WHERE change_date > last_index_date.
> For each docId you get back you then know you have to do a delete & add
> in the Lucene index.
>
> Otis
>
>
>
>
>
> --- Subhrajyoti Moitra <subhrajyotim@contata.co.in> wrote:
> > Hi,
> > I am trying to append one index to another. How should i do it?
> >
> > Let me explain my problem, probably people can suggest some better
> > way..
> >
> > i have indexed a set of pdf documents. These documents are being
> > retrieved from the DB. I have a unique docId associated with each
> > document. This is one of the fields in the index entries. I am using
> > pdfbox to parse the contents of the pdf document and convert it into
> > text for indexing.
> >
> > Scenario-I
> > Now some one adds a new document to the DB. What i am presently doing
> > is that, i retrieve all the documents from the DB,
> >  including the new one, and create a fresh index out of these set of
> > documents. The problem here is time.
> > I have some 10,000 documents in my system, re-indexing every one of
> > them again is taking a hell-of-a long time.
> > What i want is to "APPEND" the new document index-data to the
> > existing index.
> >
> > Scenario-II
> > When an existing document in the DB is changed i want to remove that
> > document from the index (this is easy since i have the unique docId
> > with me) and add the new modified index-data to the existing index,
> > instead of again recreating the entire index.
> >
> > To sum up how do i do differential indexing. (hope i am using the
> > proper terminology)
> >
> > Some one please suggest some solutions to this.
> >
> > Thank you in advance.
> > Subhro.
>
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Tax Center - File online, calculators, forms, and more
> http://tax.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message