lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan McKinley" <ryan...@gmail.com>
Subject Re: loading many documents by ID
Date Fri, 02 Feb 2007 18:47:42 GMT
> >
> > How about: Iterable<SolrDocument>
>
> Maybe... but that might not be the easiest for request handlers to
> use... they would then need to spin up a different thread and use a
> pull model (provide a new doc on demand) rather than push (call
> addDocument()).
>

With Iterable, you don't need to start a thread to implement a
'streaming' parser.  You can use an anonymous inner class that waits
until next() is called before reading the next row/line/document, etc.
 In affect this lets the RequestHandler set up all the common
configurations and then lets the UpdateHandler ask for a document one
at a time.

What I like about this is that the code that loops through each row of
my SQL updater does not need to know *anything* about the
UpdateHandler.  I would rather not call updater.addDoc( cmd ) within
the while( rs.next() )  loop.  This makes it much cleaner and easier
to test.

If writing a 'streaming' Iterable is more trouble then someone wants
to go through, they can easily return a Collection<SolrDocument> or an
array with single element.


> When I'm coding, the design tends to morph a lot.
>

mine too!


> I think we need to figure out what type of update semantics we want
> w.r.t. adding multiple documents, and all the other misc autocommit
> params.
>

Right now, what i am working with is an 'update' command that you can
pass along modes for each field.  If no modes are specified (or they
are all OVERWRITE) it behaves exactly as we have now (SQL REPLACE).
If any field uses something other then OVERWRITE, it behaves like an
SQL INSERT ... ON DUPLICATE KEY UPDATE.

Mime
View raw message