lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Carlson <carl...@bookandhammer.com>
Subject Re: Concurency in Lucene
Date Fri, 18 Oct 2002 17:21:43 GMT
This looks like a great addition.

Is there a new interface to add document and fields?

How do you handle the updating. Does the system require a unique ID per 
document?

Does this handle both RAM and FS Directories? That is what if a RAM 
Directory is used to search the index, does this solution use ram as 
another copy, or does it create an FS copy and then merge the results 
and then make another RAM Directory?

It would be great if you could contribute it. Probably since it's such 
a big change, we would first add it to the sandbox area and then after 
reviewing and testing it potentially adding it back into the core 
Lucene.

I am also cc'ing the dev board.

--Peter

On Thursday, October 17, 2002, at 06:47 AM, Scott Ganyo wrote:

> This sounds like an excellent start and would certainly be useful in a
> number of scenarios, but it is not quite as generally useful as it 
> could be
> given its asynchronous nature.  Generally expected database behavior 
> is that
> when a change is committed (and not before) it is immediately viewable 
> in
> all new transactions (i.e. new readers).
>
> Would it be difficult to modify your design to act more like a 
> traditional
> database?  If such changes were made, would it still efficiently and
> effectively solve the problems you mentioned below?
>
> Scott
>
>> -----Original Message-----
>> From: kiril.zack@epiphany.com [mailto:kiril.zack@epiphany.com]
>> Sent: Wednesday, October 16, 2002 5:45 PM
>> To: lucene-user@jakarta.apache.org
>> Subject: Concurency in Lucene
>>
>>
>> My company, Epiphany, has decided to integrate our products
>> with Lucene.
>> I'm leading this effort, and for this I have developed a
>> solution around
>> Lucene that allows concurrent processes to search, insert,
>> update and delete
>> documents.
>> This solution solves the following:
>> 	- concurrent writing (insert, update, delete) to the Index (see
>> http://nagoya.apache.org/bugzilla/show_bug.cgi?id=12588 and
>> http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg
>> 01795.html
>> 	- not-transactional nature of Lucene. Solution puts transaction
>> around every insert, update and delete. All writes are
>> guaranteed to be in
>> the index eventually.
>> 	- running out of file handles.
>> 	- solution does all of the book-keeping, clients do not
>> worry about
>> when to open and close  IndexReader/Writer. Technically one
>> can do this
>> after every operation, but creating/deleting of .lock file
>> slows things
>> down.
>>
>>
>> In summary, every write (update, delete, insert) is made to
>> log file first.
>> There is a worker thread that wakes up every so often,
>> examines the logs,
>> and makes a decision on whether to propagate changes or not (this is
>> configurable). If decision is to propagate changes, thread
>> creates new log
>> files, locks current log files,  makes a copy of the new index, merges
>> changes from logs to the index, and then hot-swaps the newly
>> created index
>> and deletes the old logs and index. At any given time, result
>> from search
>> will not contain deleted documents, but newly created/updated
>> documents will
>> not be in search result until merge is finished. Worker
>> thread also keeps
>> state of the logs/index in case of crash.
>>
>> Here is what were the driven factors to create this solution.
>> 	Need for concurrent non-blocking writes (insert/update/delete)
>> 	Need for deleted documents not to show up in the query
>> result (Hits)
>> once deleted
>> 	Lucene does not handle crashes well. The mentality is
>> "if in doubt,
>> redo index" which does not work in some cases. Rebuilding of
>> the index is
>> fast, but in our case a) it takes too many non-Lucene related
>> recourses
>> (documents can be stored in database), b) high availability
>> of search is a
>> requirement
>> 		- Lucene can leave .lock files.
>> 		- Lucene keeps state (documents) in memory
>>
>>
>> I wanted to see how much interest is out there for such a solution and
>> whether Lucene developers feel that this should be part of
>> Lucene. If there
>> is enough interest I would like to donate this code to Lucene.
>>
>> Thanks,
>>
>> Kiril Zack
>>
>> --
>> To unsubscribe, e-mail:
>> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>> For additional commands, e-mail:
>> <mailto:lucene-user-help@jakarta.apache.org>
>>


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message