lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer" <simon.willna...@googlemail.com>
Subject Re: GData Server - Lucene storage
Date Fri, 02 Jun 2006 16:17:35 GMT
On 6/2/06, Yonik Seeley <yseeley@gmail.com> wrote:
>
> On 6/2/06, Simon Willnauer <simon.willnauer@googlemail.com> wrote:
> > This is also true. This problem is still the server response, if i queue
> > some updates / inserts or index them into a RamDir I still have the
> problem
> > of concurrent indexing. The client should wait for the writing process
> to
> > finish correctly otherwise the reponse should be some Error 500. If the
> > client will not wait (be hold) there is a risk of a lost update.
> > The same problem appears in indexing entries into the search index.
> There
> > won't be a lot of inserts and update concurrent so  I can't wait for
> other
> > inserts to do batch indexing. I could index them into ramDirs and search
> > multiple indexes. but what happens if the server crashes with a certain
> > amount of entries indexed into a ramDir?
> >
> > any solutions for that in the solr project?
>
> But the problem is twofold:
> 1) You can't freely mix adds and deletes in Lucene.
> 2) changes are not immediately visible... you need to close the
> current writer and open a new IndexSearcher, which are relatively
> heavyweight operations.
>
> Solr solved (1) by adding all documents immediately as they come in
> (using the same thread as the client request).  Deletes are replied to
> immediately, but are defered.  When a "commit" happens, the writer is
> closed, a new reader is opened, and all the deletes are processed.
> Then a new IndexSearcher is opened, making all the adds and deletes
> visible.


The problem here is that there is no action comparable to commit. The entry
comes in and will be added to the storage. The delete will be queued but
when should the delete operation start. Waiting for the writer to idle?! We
could do it that way. but if a search request comes in the old entries will
be found an can be retrieved from the storage. In that case I have to hold
the already added but not yet deleted entries in a storage cache to prefent
the storage from retrieving outdated and updated entries because they have
the same ID.

You use multiple indexSearcher instances to serve searches right? so when
all the deletes are done you have to reopen all indexsearchers again right?!
So this would happen quiet often due to updates and inserts.
Hmm it is more and more a bad idea to use a lucene index as a storage.
Rather go straight to a Database.



Solr doesn't do anything to solve (2).  It's main focus has been

> providing high throughput and low latency queries, not on the
> "freshness" of updates.
>
> Decoupling the indexing from storage might help if new additions don't
> need to be searchable (but do need to be retrievable by id)... you
> could make storage synchronous, but batch the adds/deletes in some
> manner and open a new IndexSearcher less frequently.


The indexing will be decoupled from the storage anyway. otherwise I could
not provide a plugable storage. But I know I did confuse you due to using
the word indexing in previous mails.

-Yonik
> http://incubator.apache.org/solr Solr, the open-source Lucene search
> server
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

simon

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message