lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Solr Wiki] Update of "UniqueKey" by ChrisHarris
Date Fri, 05 Mar 2010 20:00:39 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "UniqueKey" page has been changed by ChrisHarris.


    Use cases change, and you may want to change the identity of the documents. For example,
an RSS feed for videos might change to give different entries for the same video in different
sizes. You may decide that the different entries are really the same document.
    There is a saying in database design:''data sticks where it lands''. Once you store data
in some format and container, it is very hard to change this decision. By adding a layer of
indirection in the SOLR schema's identity, you give yourself the ability to change the innate
identity of the document.
   * Multiple queries about the same document, with document id saved for future reference.
-  * Delete documents.
+  * Delete documents. (Though you can also delete documents matching a query, rather than
by unique key value.)
+  * If you use DistributedSearch, you need a unique key. As an added benefit, if the same
document (determined by unique key) ends up indexed in multiple shards, then only one of the
docs will get returned in user's query results.
  == Use cases which require a unique key generated from data in the document ==
   * Allow different database systems to create identity keys that work in other systems.
    The documents may come from multiple sources, and be stored in multiple places. There
may not be one convenient place in the indexing path to create a unique id. The different
sources will need to separately implement the same algorithm. The key should be a short unique
string (see UUID below).

View raw message