lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: how to Index only newly added documents?
Date Wed, 04 Nov 2009 16:19:01 GMT
The common approach is to use a UUID field in the index and run an
updateDocument with a delete term holding the UUID for a document.
That way only the latest added document for a UUID is gonna end up in
the  index.

simon

On Wed, Nov 4, 2009 at 6:41 AM, tarunsapra <t.sapra97@gmail.com> wrote:
>
> thanks for the reply!..
>
> BUt  i need to filter out the already indexed documenst ...i.e if the
> resouces directory contains 2 documents which are indexed , then when 2 more
> documents are added then the indexed should only index the newly added
> documents in the already existing index location.
> Thanks
>
> rodrigofurtado wrote:
>>
>> Look the class:
>>
>> org.pdfbox.searchengine.lucene.IndexFiles
>>
>> This a example classe for create and indexing documents when you add or
>> delete the documents into a directory.
>>
>> Basicaly you indicate this when run this class:
>>
>> For create de index directory try this:
>>
>> java -Xms256m -Xmx512m org.pdfbox.searchengine.lucene.IndexFiles -create
>> -index  <your_index_directory> <your_documents_directory>
>>
>>
>> For only index directory (new or deleted files) try this (note the second
>> argument '-create' is not present):
>>
>>
>> java -Xms256m -Xmx512m org.pdfbox.searchengine.lucene.IndexFiles -index
>> <your_index_directory> <your_documents_directory>
>>
>>
>> Bye
>>
>>>
>>> Hi People,
>>>
>>> I am stuck with a problem ,i have a resources directory in which i have
>>> lot
>>> of documents , my java programs picks up documents from this directory,
>>> is
>>> there a way using lucene APIs to recognize documents that have already
>>> been
>>> indexed and thus filter then out and use only newly added documents.
>>>
>>> Thanks
>>> Tarun
>>> --
>>> View this message in context:
>>> http://old.nabble.com/how-to-Index-only-newly-added-documents--tp26160082p26160082.html
>>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/how-to-Index-only-newly-added-documents--tp26160082p26191281.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>
>

Mime
View raw message