lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From blaise thomson <blaiset...@yahoo.com>
Subject UUID field changed when document is updated
Date Tue, 06 Dec 2011 13:24:42 GMT
Hi, 

I've been trying to use the UUIDField in solr to maintain ids of the pages I've crawled with
nutch (as per http://wiki.apache.org/solr/UniqueKey). The use case is that I want to have
the server able to use these ids in another database for various statistics gathering. So
I want the link url to act like a primary key for determining if a page exists, and if it
doesn't exist to generate a new uuid.

I've run into two problems with this:

    1. If I use the UUIDField class with default="NEW", then when a page is crawled again,
and the solr system is told to update the document, the UUID changes. 

    2. Looking at the code for UUIDField (relevant bit pasted below), it seems that the
UUID is just generated randomly. There is no check if the generated UUID has already been
used. 

I can sort of solve this problem by generating the UUID myself, as a hash of the link url,
but that doesn't help me for those random cases when the hash might happen to generate the
same UUID.

Does anyone know if there is a way for solr to only add a uuid if the document doesn't already
exist? 

Thanks!
Blaise


------------------------------------------------------------
http://javasourcecode.org/html/open-source/solr/solr-3.3.0/org/apache/solr/schema/UUIDField.java.html

  /**
   * Generates a UUID if val is either null, empty or "NEW".
   * 
   * Otherwise it behaves much like a StrField but checks that the value given
   * is indeed a valid UUID.
   * 
   * @param val The value of the field
   * @see org.apache.solr.schema.FieldType#toInternal(java.lang.String)
   */
  @Override
  public String toInternal(String val) {
    if (val == null || 0==val.length() || NEW.equals(val)) {
      return UUID.randomUUID().toString().toLowerCase(Locale.ENGLISH);
    } else {
      // we do some basic validation if 'val' looks like an UUID
      if (val.length() != 36 || val.charAt(8) != DASH || val.charAt(13) != DASH
          || val.charAt(18) != DASH || val.charAt(23) != DASH) {
        throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
            "Invalid UUID String: '" + val + "'");
      }

      return val.toLowerCase(Locale.ENGLISH);
    }
  }



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message