lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Transactional Directories
Date Mon, 14 Feb 2005 22:05:01 GMT
[ Please ignore my previous message.  I somehow hit "Send" before typing 
anything! ]

Oscar Picasso wrote:
> However with a relatively high number of random insertions, the cost of the
> "new IndexWriter / index.close()" performed for each insertion is two high.

Did you measure that?  How much slower was it?  Did you perform any 
profiling?  Perhaps one could improve this by, e.g., disabling document 
index buffering, so that indexes are written directly to the final 
directory in this case, rather than first bufferred in a RAMDirectory.

> Unfortunately this it is a common case for some kind of applications and it is
> where a transactional directory would the most useful.
> 
> In such a case you would like to do something like that:
> -- case B --
> <pseudo-code>
> new IndexWriter
>  ...
> +begin transaction-1
>  create/update/delete objects in the database
>  index.addDocument (related to the objects)
> + commit
> ...
> +begin transaction-2
>  create/update/delete objects in the database
>  index.addDocument (related to the objects)
> + commit
> ...
> indexWriter.close()
> </pseudo-code>
> 
> The benefits would be to protect individual insertions while avoiding the cost
> of using each time a new IndexWriter.
> 
> It doesn't work however. Here is my understanding. 
> 
> Suppose that in case B, transaction-1 fails and transaction-2 succeeds.

So you've got multiple threads?  Or are you proceeding in the face of 
exceptions?  Otherwise I would expect that if transaction-1 fails then 
you'd avoid transaction-2, no?

Also, you'd want to add an flush() call after each addDocument(), since 
document additions are bufferred.  But a flush() is just what 
IndexWriter.close() does, so then things would not be any faster than 
creating a new IndexWriter for each document.

The bottom line is that there are optimizations to be made when batching 
additions.  Lucene's API is designed to encourage batching, so that 
these optimizations may be used.  If you don't batch, things will be 
somewhat slower.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message