lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "UniqueKey" by Lance Norskog
Date Fri, 10 Jul 2009 00:50:10 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by Lance Norskog:
http://wiki.apache.org/solr/UniqueKey

------------------------------------------------------------------------------
  == Text field in the document ==
   In the blog RSS example above, the URL of each article. The field must be single-valued.
  == UUID techniques ==
-  UUID is short for Universal Unique IDentifier. The UUID standard [http://www.ietf.org/rfc/rfc4122.txt
RFC-4122] includes several types of UUID with different input formats. There is a UUID field
type in Solr 1.4 which implements version 4. Also, the ExtractingRequestHandler automatically
creates UUID version 4. You can also implement a UUID string from a cryptographic hash.
+  UUID is short for Universal Unique IDentifier. The UUID standard [http://www.ietf.org/rfc/rfc4122.txt
RFC-4122] includes several types of UUID with different input formats. There is a UUID field
type (called {{{UUIDField}}}) in Solr 1.4 which implements version 4. Fields are defined in
the schema.xml file with:
+  {{{
+  <fieldType name="uuid" class="solr.UUIDField" indexed="true" />}}}
+  and used by
+  {{{
+  <field name="id" type="uuid" indexed="true" stored="true" default="NEW"/>}}}
+  Also, the ExtractingRequestHandler automatically creates UUID version 4. You can also implement
a UUID string from a cryptographic hash.
  == Cryptographic hash ==
   A cryptographic hashing algorithm can be thought of as creating N very random bits from
the input data. The MD5 algorithm create 128 bits. This means that 2 input data sets have
a chance of 1 in 2^128 of creating the same MD5. There is a standard expression of this as
32 hexadecimal characters. [http://www.ietf.org/rfc/rfc1321.txt RFC-1321]. Several MD5 digest
algorithm packages for various languages do not follow this standard. The UUID standard always
includes the time at the creation of the UUID, which precludes some of the above use cases.
You can cheat and ignore the clock requirement. It is best to use the UUID text format: ''550e8400-e29b-41d4-a716-446655440000''
instead of ''550e8400e29b41d4a716446655440000''. (You will read many of these keys.)
  

Mime
View raw message