lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Mellor <pmel...@hemscott.co.uk>
Subject RE: Concurrent searching & re-indexing
Date Thu, 17 Feb 2005 09:34:58 GMT
Otis,

Looking at your reply again, I have a couple of questions -

"IndexSearcher (IndexReader, really) does take a snapshot of the index state
when it is opened, so at that time the index segments listed in segments
should be in a complete state.  It also reads index files when searching, of
course."

1. If IndexReader takes a snapshot of the index state when opened and then
reads the files when searching, what would happen if the files it takes a
snapshot of are deleted before the search is performed (as would happen with
a reindexing in the period between opening an IndexSearcher and using it to
search)?

2. Does a similar potential problem exist when optimising an index, if this
combines all the segments into a single file?

Many thanks

Paul

-----Original Message-----
From: Paul Mellor [mailto:pmellor@hemscott.co.uk]
Sent: 16 February 2005 17:37
To: 'Lucene Users List'
Subject: RE: Concurrent searching & re-indexing


But all write access to the index is synchronized, so that although multiple
threads are creating an IndexWriter for the same directory and using it to
totally recreate that index, only one thread is doing this at once.

I was concerned about the safety of using an IndexSearcher to perform
queries on an index that is in the process of being recreated from scratch,
but I guess that if the IndexSearcher takes a snapshot of the index when it
is created (and in my code this creation is synchronized with the write
operations as well so that the threads wait for the write operations to
finish before instantiating an IndexSearcher, and vice versa) this can't be
a problem.

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: 16 February 2005 17:30
To: Lucene Users List
Subject: Re: Concurrent searching & re-indexing


Hi Paul,

If I understand your setup correctly, it looks like you are running
multiple threads that create IndexWriter for the ame directory.  That's
a "no no".

This section (first hit) describes all various concurrency issues with
regards to adds, updates, optimization, and searches:
  http://www.lucenebook.com/search?query=concurrent

IndexSearcher (IndexReader, really) does take a snapshot of the index
state when it is opened, so at that time the index segments listed in
segments should be in a complete state.  It also reads index files when
searching, of course.

Otis


--- Paul Mellor <pmellor@hemscott.co.uk> wrote:

> Hi,
> 
> I've read from various sources on the Internet that it is perfectly
> safe to
> simultaneously search a Lucene index that is being updated from
> another
> Thread, as long as all write access to the index is synchronized. 
> But does
> this apply only to updating the index (i.e. deleting and adding
> documents),
> or to a complete re-indexing (i.e. create a new IndexWriter with the
> 'create' argument true and then re-add all the documents)?
> 
> I have a class which encapsulates all access to my index, so that
> writes can
> be synchronized.  This class also exposes a method to obtain an
> IndexSearcher for the index.  I'm running unit tests to test this
> which
> create many threads - each thread does a complete re-indexing and
> then
> obtains an IndexSearcher and does a query.
> 
> I'm finding that with sufficiently high numbers of threads, I'm
> getting the
> occasional failure, with the following exception thrown when
> attempting to
> construct a new IndexWriter (during the reindexing) -
> 
> java.io.IOException: couldn't delete _a.f1
> at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166)
> at
>
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:135)
> at
>
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:113)
> at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:151)
> ...
> 
> The exception occurs quite infrequently (usually for somewhere
> between 1-5%
> of the Threads).
> 
> Does the IndexSearcher take a 'snapshot' of the index at creation? 
> Or does
> it access the filesystem whilst searching?  I am also synchronizing
> creation
> of the IndexSearcher with the write lock, so that the IndexSearcher
> is not
> created whilst the index is being recreated (and vice versa).  But do
> I need
> to ensure that the IndexSearcher cannot search whilst the index is
> being
> recreated as well?
> 
> Note that a similar unit test where the threads update the index
> (rather
> than recreate it from scratch) works fine, as expected.
> 
> This is running on Windows 2000.
> 
> Any help would be much appreciated!
> 
> Paul
> 
> This e-mail and any files transmitted with it are confidential and
> intended
> solely for the use of the individual or entity to whom they are
> addressed.
> If you are not the intended recipient, you should not copy,
> retransmit or
> use the e-mail and/or files transmitted with it  and should not
> disclose
> their contents. In such a case, please notify
> netwebmaster@hemscott.co.uk
> and delete the message from your own system. Any opinions expressed
> in this
> e-mail and/or files transmitted with it that do not relate to the
> official
> business of this company are those solely of the author and should
> not be
> interpreted as being endorsed by this company.
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


_____________________________________________________________________
This e-mail has been scanned for viruses by MCI's Internet Managed Scanning
Services - powered by MessageLabs. For further information visit
http://www.mci.com

This e-mail and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you are not the intended recipient, you should not copy, retransmit or
use the e-mail and/or files transmitted with it  and should not disclose
their contents. In such a case, please notify netwebmaster@hemscott.co.uk
and delete the message from your own system. Any opinions expressed in this
e-mail and/or files transmitted with it that do not relate to the official
business of this company are those solely of the author and should not be
interpreted as being endorsed by this company.


_____________________________________________________________________
This e-mail has been scanned for viruses by MCI's Internet Managed Scanning
Services - powered by MessageLabs. For further information visit
http://www.mci.com

_____________________________________________________________________
This e-mail has been scanned for viruses by MCI's Internet Managed Scanning
Services - powered by MessageLabs. For further information visit
http://www.mci.com

This e-mail and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you are not the intended recipient, you should not copy, retransmit or
use the e-mail and/or files transmitted with it  and should not disclose
their contents. In such a case, please notify netwebmaster@hemscott.co.uk
and delete the message from your own system. Any opinions expressed in this
e-mail and/or files transmitted with it that do not relate to the official
business of this company are those solely of the author and should not be
interpreted as being endorsed by this company.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message