lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dino Korah" <dcko...@gmail.com>
Subject RE: Transaction semantics in Document addition
Date Mon, 19 May 2008 18:32:09 GMT
In your scenario, it might work, but I wonder how you generate hits,
excluding the fullyindexed=false.

-----Original Message-----
From: N Hira [mailto:nhira@cognocys.com] 
Sent: 19 May 2008 18:31
To: java-user@lucene.apache.org
Subject: Re: Transaction semantics in Document addition

How about an attribute (fullyIndexed=true/false) to keep track of whether
the indexing was successful?

We used a similar attribute for a similar problem, but stored it in the
accompanying database instead.

-h




----- Original Message ----
From: Michael McCandless <lucene@mikemccandless.com>
To: java-user@lucene.apache.org
Sent: Monday, May 19, 2008 4:02:52 AM
Subject: Re: Transaction semantics in Document addition


Dino Korah wrote:
> Hi All,
>
> I am dealing with a situation where a document could possibly have 
> multiple attachments to it, and they are all added to the index under 
> a document-id (not lucene doc-id). Now if one of the attachments fail 
> to get indexed due to failure of any subsystem like the text 
> extraction module, I need to abort the complete set of documents under 
> the document id. I add individual attachments as a seperate documents 
> in lucene.
>
> Whats the best mechanism to accomplish this.
>
> One possible solution I have in mind is to create a RAMDirectory for 
> each document with attachems to it and add everything to the 
> RAMDirectory. Once we are satisfied with the indexing, merge it into 
> the main index.
>
> Would that work?

That will work but performance may not be great.

You could also open the IndexWriter with autoCommit=false, and then if you
hit a failure in the text extraction, call abort() to cancel all docs you
had added to this IndexWriter?  But that's bad because you lose all the
documents that had succeeded.

Or, on hitting a failure in one of a document's attachments, you could just
call deleteDocument on those attachments already added, to "undo" them?
This is in fact how IndexWriter itself deals with an exception while adding
a document (to avoid having only a part of the document in the index): if it
his an exception after processing some number of tokens, it just marks that
document as deleted.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message