lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Concurency in Lucene
Date Thu, 17 Oct 2002 18:25:33 GMT
Moved to lucene-dev.

Wow, this sounds good, but I'd love to see the code first and see how
this log file business works first.

Also, I don't think we ever got any tests that use multiple threads and
demonstrate the problems with symultaneous read/write/delete, etc.
It would be nice to have something like that, so that we can verify
that this contribution really fixes these problems.

Thanks,
Otis


--- kiril.zack@epiphany.com wrote:
> My company, Epiphany, has decided to integrate our products with
> Lucene.
> I'm leading this effort, and for this I have developed a solution
> around
> Lucene that allows concurrent processes to search, insert, update and
> delete
> documents. 
> This solution solves the following:
> 	- concurrent writing (insert, update, delete) to the Index (see
> http://nagoya.apache.org/bugzilla/show_bug.cgi?id=12588 and
>
http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg01795.html
> 	- not-transactional nature of Lucene. Solution puts transaction
> around every insert, update and delete. All writes are guaranteed to
> be in
> the index eventually.
> 	- running out of file handles.
> 	- solution does all of the book-keeping, clients do not worry about
> when to open and close  IndexReader/Writer. Technically one can do
> this
> after every operation, but creating/deleting of .lock file slows
> things
> down.
> 
> 
> In summary, every write (update, delete, insert) is made to log file
> first.
> There is a worker thread that wakes up every so often, examines the
> logs,
> and makes a decision on whether to propagate changes or not (this is
> configurable). If decision is to propagate changes, thread creates
> new log
> files, locks current log files,  makes a copy of the new index,
> merges
> changes from logs to the index, and then hot-swaps the newly created
> index
> and deletes the old logs and index. At any given time, result from
> search
> will not contain deleted documents, but newly created/updated
> documents will
> not be in search result until merge is finished. Worker thread also
> keeps
> state of the logs/index in case of crash. 
> 
> Here is what were the driven factors to create this solution. 
> 	Need for concurrent non-blocking writes (insert/update/delete)
> 	Need for deleted documents not to show up in the query result (Hits)
> once deleted
> 	Lucene does not handle crashes well. The mentality is "if in doubt,
> redo index" which does not work in some cases. Rebuilding of the
> index is
> fast, but in our case a) it takes too many non-Lucene related
> recourses
> (documents can be stored in database), b) high availability of search
> is a
> requirement
> 		- Lucene can leave .lock files.
> 		- Lucene keeps state (documents) in memory
> 
> 
> I wanted to see how much interest is out there for such a solution
> and
> whether Lucene developers feel that this should be part of Lucene. If
> there
> is enough interest I would like to donate this code to Lucene.
> 
> Thanks,
> 
> Kiril Zack
> 
> --
> To unsubscribe, e-mail:  
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
> 


__________________________________________________
Do you Yahoo!?
Faith Hill - Exclusive Performances, Videos & More
http://faith.yahoo.com

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message