lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan McKinley" <ryan...@gmail.com>
Subject Re: loading many documents by ID
Date Thu, 01 Feb 2007 20:46:29 GMT
>
> > REPLACE_DOCUMENT
> > REPLACE_FIELDS
> > REPLACE_DISTINCT_FIELDS
> > ADD_FIELDS
> > ADD_DISTINCT_FIELDS
>
> What does "distinct" mean in this context?
>

I am (was?) using DISTINCT to say, only add the unique fields.  As
implemented, it keeps a Collection<String> for each field name.  If
the 'mode' is 'DISTINCT' the collection is Set<String>, otherwise
List<String>


> There is a lot of processing going on inside Document Builder.
> Once you get to the UpdateCommand, you have already lost some
> information (copyFields have executed, some things have been converted
> to index form, etc).
>

I noticed that!  It made sense when I was implementing this in a
RequestHandler, but it gets a little wonky inside the UpdateHandler -
as you said, copyFields already executed.

I think the best thing is to make a new command that does not directly
take a lucene document as its input.  perhaps:

http://svn.lapnap.net/solr/solrj/src/org/apache/solr/client/solrj/SolrDocument.java
http://svn.lapnap.net/solr/solrj/src/org/apache/solr/client/solrj/impl/SimpleSolrDoc.java

Then the UpdateHandler would open the DocumentBuilder merge the
existing document with the passed in document using whatever method
specified.


> I would think one would also want to specify things per field.
>
> - append this value to this field
> - increment the value of this field
> - append this value to this field
> - overwrite this field
>

How would you feel about an interface like this:


public class IndexDocumentsCommand
{
  public enum MODE {
    APPEND,    // add the fields to existing fields
    OVERWRITE, // overwrite existing fields
    INCREMENT, // increment existing field
    DISTINCT   // same as APPEND, but make sure there are distinct values
  };

  // optional id in "internal" indexed form... if it is needed and not supplied,
  // it will be obtained from the doc.
  public String indexedId;

  public Collection<SolrDocument> docs;
  public boolean allowDups;
  public boolean overwrite;
  public SimpleOrderedMap<MODE> modifyFieldMode; // What to do for
each field.  We should support *
  public int commitMaxTime = -1; // make sure these documents are
committed within this much time
}


ryan

Mime
View raw message