lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: loading many documents by ID
Date Fri, 02 Feb 2007 18:22:21 GMT
On 2/1/07, Ryan McKinley <ryantxu@gmail.com> wrote:
> >
> > Not sure... depends on how update handlers will use it...
>
> by update handler, you mean UpdateRequestHandler(s)? or UpdateHandler?

Both.

> > One thing we might not want to get rid of though is streaming
> > (constructing and adding a document, then discarding it).  People are
> > starting to add a lot of documents in a single XML request, and this
> > will be much larger for CVS/SQL.
> >
>
> So you are uncomfortable with the Collection because you would have to
> load all the documents before indexing them.  If this was many, it
> could be a problem...
>
> If UpdateHandler is going to take care of stuff like autocommit and
> modifying documents, It seems best to have that apply to all the
> documents you are going to modify as a unit.  For example, say i have
> a SQL updater that will modify 100,000 documents incrementing field
> 'count_*' and replacing 'fl_*'.  If the DocumentCommand only applies
> to a single document, it would have to match each field as it went
> along rather then once when it starts.
>
> How about: Iterable<SolrDocument>

Maybe... but that might not be the easiest for request handlers to
use... they would then need to spin up a different thread and use a
pull model (provide a new doc on demand) rather than push (call
addDocument()).

 I'm really just thinking a little out loud... just first impressions
- don't read too much into it.
When I'm coding, the design tends to morph a lot.

I think we need to figure out what type of update semantics we want
w.r.t. adding multiple documents, and all the other misc autocommit
params.

-Yonik

Mime
View raw message