lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Trivial Update of "Per Steffensen/Update semantics" by Per Steffensen
Date Thu, 19 Apr 2012 10:47:07 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "Per Steffensen/Update semantics" page has been changed by Per Steffensen:
http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics?action=diff&rev1=18&rev2=19

Comment:
Ready for review

  
  Please note that the features described here have not yet been committed.
  
+ <<TableOfContents>>
+ 
  == Motivation ==
  
  Solr is missing advanced features when using it as a NoSQL database and not just as a search
index. When talking about using it as a NoSQL database instead of just as a search index,
I primarily mean cases where you use Solr in a way where you have (potentially many) threads
concurrently inserting, updating and deleting documents. Using Solr as a search index, is
more about first indexing your entire world into Solr using one thread (or many threads, but
without the possibility that they mess with data indexed by another thread), and afterwards
solely using it for searching.
  
  Some of the features missing are:
-  * Insert semantics as we know it from RDBMSs: Do not insert a document if it already exists
in Solr. A document is defined to exist in Solr, if a document with the same value in uniqueKey-field
already exists. Very much like the following SQL does NOT insert (instead it fails with a
UniqueKeyConstraint error) if there is a unique key constraint on column "id" and a row with
id=1234 already exists
+  * Insert semantics as we know it from RDBMSs: Do not insert a document if it already exists
in Solr. A document is defined to exist in Solr, if a document with the same value in uniqueKey-field
already exists. Very much like the following SQL does NOT insert (instead it fails with a
unique-key-constraint error) if there is a unique key constraint on column "id" and a row
with id=1234 already exists
  {{{
  INSERT INTO docs (id, column2, column3,...) VALUES (1234, value2, value3,...)
  }}}
-  * Update semantics with version control (for optimistic locking) as we know it from RDBMSs:
Do not add document if the document does not already exist and do not update if it has been
changed since it was loaded for update by the client doing the update. Very much like the
following SQL does NOT update a row with id=1234 if the version of the document in Solr at
the time of update is not (any longer) 5678. This feature is used by popular O/R-mappers (like
Hibernate) to provide a VersionConflict error if the object (row/document) you loaded for
update has changed since you loaded it when you try to store your updated version.
+  * Update semantics with version control (for optimistic locking) as we know it from RDBMSs:
Do not add document if the document does not already exist and do not update if it has been
changed since it was loaded for update by the client doing the update. Very much like the
following SQL does NOT update a row with id=1234 if the version of the document in the RDBMS
at the time of update is not (any longer) 5678. This feature is used by popular O/R-mappers
(like Hibernate) to provide a version-conflict error if the object (row/document) you loaded
for update has changed since you loaded it when you try to store your updated version.
  {{{
  UPDATE docs SET column2=value2, column3=value3, ... WHERE id=1234 AND version=5678
  }}}
@@ -28, +30 @@

  
  === Description ===
  
- The above features could have been implemented by providing you with different ways of "updating"
documents in Solr, than by using the "update-add-docs" operation. But instead the "update-add-docs"
operation is still the only operation you have for inserting/updating documents in Solr, but
now you have a way of controlling the exact semantics you want Solr to do behind the scenes.
First of all you need to decide wich sematics-mode you want to use - you have the following
options
+ The above features could have been implemented by providing you with different ways of "updating"
documents in Solr, than by using the "update-add-docs" operation. But instead the "update-add-docs"
operation is still the only operation you have for inserting/updating documents in Solr, but
now you have a way of controlling the exact semantics you want Solr to do behind the scenes.
First of all you need to decide which sematics-mode you want to use - you have the following
options
   * '''classic''': Solr uses the same update semantics as it has always done, without failing
on "unique key conflict" during create/insert and without failing on "version conflicts" during
update. This is default, so out of the box Solr works as always.
   * '''consistent''': You are forced to (indirectly) state if your intent is to insert or
update. If your intent is to insert, the "update-add-docs" operation will fail if a document
with the same uniqueKey-value already exists. If your intent is to update, the "update-add-docs"
opeartion will fail if the document (a document with the same uniqueKey-value) does not already
exist, or if the value of the _version_-field does not match the value in the already existing
document. You state your intent by setting the value of the _version_ field
    * _version_ <= 0 (or not set): Intent is to insert
@@ -37, +39 @@

  
  === Errors ===
  
- When using '''consistency''' or the consistency-features of '''classic-consistency-hybrid'''
the update for single documents can fail in a non-fatal way. Errors are sent back to the Solr
client in the response, and it is up to the client to react in a resonable way. Errors consist
of
+ When using '''consistency''' or the consistency-features of '''classic-consistency-hybrid'''
(setting _version_ to something different than 0) the update of single documents can fail
in a non-fatal way, while the update of other documents in the same request succeeded. Errors
are sent back to the Solr client in the response, and it is up to the client to react in a
resonable way. The information in a single error consist of
   * '''A code''': Corresponding to a HTTP reponse status code
   * '''A type''': The type of error occured. It consists of
    * '''A namespace''': The context of the type
    * '''A name''': The name of the type, uniquely identifying the error (at least within
the namespace)
   * '''A message''': Some additional text describing details about the error
   * '''A part reference''': A reference to the document in the update-request to which this
error relates. It is called "part reference" instead of "document reference" because the error
propagation method used is designed to be usable for reporting all kind of partial errors
during the handling of requests. The "part reference" is only present in the error if the
request contained multiple parts (multiple documents)
+ A request can result in zero, one or many errors.
  
  As you know by now, in order to link errors in responses properly with the documents in
the request, you need to also add a "part reference" to all of you documents in the request.
If you dont explicitly provide a "part reference" for a document in a multi-document request,
and the handling of this particular document results in an error, the "part reference" in
the response will just be a random UUID and you will not be able to match errors in the response
with documents in the request.
  
- In the context of document-update-add requests, 400 is always used as error-code. The following
error-types are relevant
+ In the context of "update-add-docs" requests, 400 is always used as error-code. The following
error-types are relevant
-  * Error-namespace='''org.apache.solr.common.partialerrors.update''', error-name='''DocumentDoesNotExist''':
Indicating that the document you tried to consistency-update does not exist (anymore)
+  * Error-namespace='''org.apache.solr.common.partialerrors.update''', error-name='''~DocumentDoesNotExist''':
Indicating that the document you tried to consistency-update does not exist (anymore)
-  * Error-namespace='''org.apache.solr.common.partialerrors.update''', error-name='''DocumentAlreadyExists''':
Indicating that the document you tried to consistency-create already exists (or at least a
document with the same uniqueKey value)
+  * Error-namespace='''org.apache.solr.common.partialerrors.update''', error-name='''~DocumentAlreadyExists''':
Indicating that the document you tried to consistency-create already exists (or at least a
document with the same uniqueKey value)
-  * Error-namespace='''org.apache.solr.common.partialerrors.update''', error-name='''VersionConflict''':
Indicating that the document you tried to consistency-update has changed since you fetched
it for update (version number has changed)
+  * Error-namespace='''org.apache.solr.common.partialerrors.update''', error-name='''~VersionConflict''':
Indicating that the document you tried to consistency-update has changed since you fetched
it for update (version number has changed)
-  * Error-namespace='''org.apache.solr.common.partialerrors''', error-name='''WrongUsage''':
Indicating that you are using the features in a wrong way - e.g. if you try to do a consistency-insert/update
but there is no uniqueKey defined in your Solr schema or no value for the uniqueKey-field
of the document sent in the request. 
+  * Error-namespace='''org.apache.solr.common.partialerrors''', error-name='''~WrongUsage''':
Indicating that you are using the features in a wrong way - e.g. if you try to do a consistency-insert/update
but there is no uniqueKey defined in your Solr schema or no value for the uniqueKey-field
of the document is specified in the request. 
  
  == Using it ==
  
- This section describes how to use the features as a Solr client.
+ This section describes how to use the features as a Solr user.
  
  === Configuration of Solr server ===
  
- You control the semantics-mode by adding a semanticsMode tag inside your DirectUpdateHandler2-based
updateHandler. In solrconfig.xml:
+ You control the semantics-mode by adding a semanticsMode tag inside your ~DirectUpdateHandler2-based
updateHandler. In solrconfig.xml:
  {{{#!xml
    <updateHandler class="solr.DirectUpdateHandler2">
      ...
@@ -76, +79 @@

   <field name="id" type="string" indexed="true" stored="true" required="true"/>
   <uniqueKey>id</uniqueKey>
  }}}
-  * If you want do consistency updates including version control (sending values for _version_
bigger than 0), you need to have a "_version_"-field in your schema. In schema.xml:
+  * If you want to do consistency-updates including version control (sending values for _version_
bigger than 0), you need to have a "_version_"-field in your schema. In schema.xml:
  {{{#!xml
   <field name="_version_" type="long" indexed="true" stored="true" />
  }}}
-  * You need to enable updateLog in your DirectUpdateHandler2-based updateHandler. In solrconfig.xml:
+  * You need to enable updateLog in your ~DirectUpdateHandler2-based updateHandler. In solrconfig.xml:
  {{{#!xml
   <updateHandler class="solr.DirectUpdateHandler2">
      ...
@@ -180, +183 @@

  
  ======= Body =======
  
- In this subsection lets pretend we sent a multi-document request like the one shown (in
different formats) above, and that all documents succeeded except the two with "part reference"
refA and refN. The body of the HTTP response will look like show in the following subsections
+ In this subsection lets pretend we sent a multi-document request like the one shown (several
times in different formats) above, and that all documents succeeded except the two with "part
reference" refA and refN. The body of the HTTP response will look like show in the following
subsections
  
  ======== XML ========
  
@@ -248, +251 @@

  docs.add(docN);
  }}}
  
- Note that it is not normal (as in the example above) to create a new SolrInputDocuments
with _version_ (SolrInputDocument.VERSION_FIELD) field set to a number above 0 (indicating
update and not insert). A SolrInputDocument with _version_ above 0 will usually be one that
has been fetched from Solr for modification and restorage (update) in Solr.
+ Note that it is not normal (as in the example above) to create a new ~SolrInputDocuments
with _version_ (~SolrInputDocument.VERSION_FIELD) field set to a hardcoded number above 0
(indicating update and not insert). A ~SolrInputDocument with _version_ above 0 will usually
have been populated from a document fetched (search og realtime-get) from Solr for modification
and restorage (update).
- No need to deal explicitly with "part references" - SolrInputDocument will handle it automatically
for you.
+ No need to deal explicitly with "part references" - ~SolrInputDocument will handle it automatically
for you.
  
  ==== Sending requests ====
  
@@ -278, +281 @@

  }
  }}}
  
- The possible classes (subclasses of DocumentUpdatePartialError) of '''err''' are java-Exception
classes correspondig to the error-types mentioned above, where the package of the class corresponds
to the error-type-namespace and where the name of the class correspond to the error-type-name.
+ The possible classes (subclasses of ~DocumentUpdatePartialError) of '''err''' are java-Exception
classes correspondig to the error-types mentioned above, where the package of the class corresponds
to the error-type-namespace and where the name of the class correspond to the error-type-name.
  
  If you only send one document in your request you can catch the java-Exception corresponding
to the error-type directly
  
  {{{#!java
  try {
-     UpdateResponse response = server.add(... one doc ..., ... your SolrParams ...).get();
+     UpdateResponse response = server.add(docA, ... your SolrParams ...).get();
  } catch (org.apache.solr.common.partialerrors.update.DocumentDoesNotExist e) {
      //... do something ...
  } catch (org.apache.solr.common.partialerrors.update.DocumentAlreadyExists e) {

Mime
View raw message