lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "Per Steffensen/Update semantics" by Per Steffensen
Date Wed, 18 Apr 2012 21:35:03 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "Per Steffensen/Update semantics" page has been changed by Per Steffensen:
http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics?action=diff&rev1=12&rev2=13

Comment:
Almost done

  
  == Solution ==
  
+ === Description ===
+ 
  The above features could have been implemented by providing you with different ways of "updating"
documents in Solr, than by using the "update-add-docs" operation. But instead the "update-add-docs"
operation is still the only operation you have for inserting/updating documents in Solr, but
now you have a way of controlling the exact semantics you want Solr to do behind the scenes.
First of all you need to decide wich sematics-mode you want to use - you have the following
options
   * '''classic''': Solr uses the same update semantics as it has always done, without failing
on "unique key conflict" during create/insert and without failing on "version conflicts" during
update. This is default, so out of the box Solr works as always.
   * '''consistent''': You are forced to (indirectly) state if your intent is to insert or
update. If your intent is to insert, the "update-add-docs" operation will fail if a document
with the same uniqueKey-value already exists. If your intent is to update, the "update-add-docs"
opeartion will fail if the document (a document with the same uniqueKey-value) does not already
exist, or if the value of the _version_-field does not match the value in the already existing
document. You state your intent by setting the value of the _version_ field
-  ** _version_ <= 0 (or not set): Intent is to insert
+   * _version_ <= 0 (or not set): Intent is to insert
-  ** _version_ > 0: Intent is to update
+   * _version_ > 0: Intent is to update
   * '''classic-consistent-hybrid''': An hybrid between '''classic''' and '''consistent'''.
Only difference from '''consistent''' is that you get '''classic''' semantics if you set _version_
to 0 (or dont set it)
  
+ === Errors ===
+ 
+ When using '''consistency''' or the consistency-features of '''classic-consistency-hybrid'''
the update for single documents can fail in a non-fatal way. Errors are sent back to the Solr
client in the response, and it is up to the client to react in a resonable way. Errors consist
of
+  * '''A code''': Corresponding to a HTTP reponse status code
+  * '''A type''': The type of error occured. It consists of
+   * '''A namespace''': The context of the type
+   * '''A name''': The name of the type, uniquely identifying the error (at least within
the namespace)
+  * '''A message''': Some additional text describing details about the error
+  * '''A part reference''': A reference to the document in the update-request to which this
error relates. It is called "part reference" instead of "document reference" because the error
propagation method used is designed to be usable for reporting all kind of partial errors
during the handling of requests. The "part reference" is only present in the error if the
request contained multiple parts (multiple documents)
+ 
+ As you know by now, in order to link errors in responses properly with the documents in
the request, you need to also add a "part reference" to all of you documents in the request.
If you dont explicitly provide a "part reference" for a document in a multi-document request,
and the handling of this particular document results in an error, the "part reference" in
the response will just be a random UUID and you will not be able to match errors in the response
with documents in the request.
+ 
+ In the context of document-update-add requests, 400 is always used as error-code. The following
error-types are relevant
+  * Error-namespace='''org.apache.solr.common.partialerrors.update''', error-name='''DocumentDoesNotExist''':
Indicating that the document you tried to consistency-update does not exist (anymore)
+  * Error-namespace='''org.apache.solr.common.partialerrors.update''', error-name='''DocumentAlreadyExists''':
Indicating that the document you tried to consistency-create already exists (or at least a
document with the same uniqueKey value)
+  * Error-namespace='''org.apache.solr.common.partialerrors.update''', error-name='''VersionConflict''':
Indicating that the document you tried to consistency-update has changed since you fetched
it for update (version number has changed)
+  * Error-namespace='''org.apache.solr.common.partialerrors''', error-name='''WrongUsage''':
Indicating that you are using the features in a wrong way - e.g. if you try to do a consistency-insert/update
but there is no uniqueKey defined in your Solr schema or no value for the uniqueKey-field
of the document sent in the request. 
+ 
  == Using it ==
  
  This section describes how to use the features as a Solr client.
  
- === Requirements ===
+ === Configuration of Solr server ===
  
  You control the semantics-mode by adding a semanticsMode tag inside your DirectUpdateHandler2-based
updateHandler. In solrconfig.xml:
  {{{#!xml
@@ -65, +85 @@

   </updateHandler>
  }}}
  
- === Homemade requests ===
+ === Raw HTTP requests and responses ===
  
- ==== Construct requests - JSON ====
+ This subsection describes the relevant parts of content and structure of the raw HTTP request
and response. This is especially interesting if you are not using a java-based client.
  
- Send values for the _version_ field in your JavaScript documents (see more [[UpdateJSON|here]])
to control if you get insert-, update- og classic-semantics (depending on your semantics-mode
configuration) like this:
- {{{#!json
- [
-  {... set doc fields ..., "_version_" : -1}
-  ... add other docs ...
-  {... set doc fields ..., "_version_" : 1234567890}
- ]
- }}}
- 
- ==== Construct requests - XML ====
+ ==== Constructing requests ====
  
- Send values for the _version_ field in your JavaScript documents (see more [[UpdateXmlMessages|here]])
to control if you get insert-, update- og classic-semantics (depending on your semantics-mode
configuration) like this:
+ ===== XML =====
+ 
+ Provide "part references" and _version_-field-values in your XML documents (see more [[UpdateXmlMessages|here]])
like this:
+ 
  {{{#!xml
  <add>
-   <doc>
+   <doc partref="refA">
      ... set doc fields ...
      <field name="_version_">-1</field>
    </doc>
    ... add other docs ...
-   <doc>
+   <doc partref="refN">
      ... set doc fields ...
      <field name="_version_">1234567890</field>
    </doc>
  </add>
  }}}
  
- ==== Catching errors ====
+ ===== JSON =====
  
- TODO
+ Provide "part references" and _version_-field-values in your JSON documents (see more [[UpdateJSON|here]])
like this:
  
+ {{{#!json
+ [
+  { 'partref' : 'refA' ... set doc fields ..., '_version_' : -1}
+  ... add other docs ...
+  { 'partref' : 'refN' ... set doc fields ..., '_version_' : 1234567890}
+ ]
+ }}}
+ 
+ Note that is hard to see the difference between the "part reference" and a field called
"partref". If the first "field" in the document has the name "partref" it is not considered
as a field but as the "part reference". This means that if you have a real field called "partref"
you cannot send it as the first field inside a document.
+ 
+ ===== CSV =====
+ 
+ Provide "part references" and _version_-field-values in your CSV documents (see more [[UpdateCSV|here]])
like this:
+  * _version_ is actually just a field like anyone else - use fieldname "_version_"
+  * Also send "part reference" as a field like anyone else - use fieldname "nonfield.partref"
+ 
+ E.g.
+ * fieldnames=nonfield.partref,... doc field names ...,_version_
+ * CSV lines:
+ {{{
+ refA,... set doc field values ...,-1
+ ... add other docs ...
+ refN,... set doc field values ...,1234567890
+ }}}
+ 
+ ==== Checking responses for partial errors ====
+ 
+ ===== Status line =====
+ 
+ ====== General ======
+ 
+ Of course, if the update of all documents in the request succeeds, the HTTP response status
line will look like this
+ 
+ {{{
+ HTTP/1.1 200 OK
+ }}}
+ 
+ If errors occur the HTTP response status line will in general look like this
+ 
+ {{{
+ HTTP/1.1 400 <error-message>, error-type=<error-namespace>.<error-name>
+ }}}
+ 
+ ====== Single document updates ======
+ 
+ If you only sent one document for update the error which occured handling that single document
will be encoded in the HTTP response status line. Example
+ 
+ {{{
+ HTTP/1.1 400 Attempt to update (_version_ > 0 specified explicitly in document) document
failed. Document does not exist, error-type=org.apache.solr.common.partialerrors.update.DocumentDoesNotExist
+ }}}
+ 
+ ====== Multi document updates ======
+ 
+ If you sent multiple documents for update and the handling of some (might be all) of them
occured in errors the HTTP response status line will look like this
+ 
+ {{{
+ HTTP/1.1 400 Some parts of the request resulted in errors. Need to check response for partial
errors. Documents sent for update with no corresponding partial error succeeded., error-type=org.apache.solr.common.partialerrors.PartialErrors
+ }}}
+ 
+ ======= Body =======
+ 
+ ======== XML ========
+ 
+ {{{#!xml
+ <?xml version="1.0" encoding="UTF-8"?>
+ <response>
+   ... responseHeader etc. ...
+   <arr name="partialerrors">
+     <lst>
+       <int name="error-code">400</int>
+       <str name="error-type">org.apache.solr.common.partialerrors.update.DocumentAlreadyExists</str>
+       <str name="error-msg">Attempt to insert (_version_ &lt;= 0 specified explicitly
in document) document failed. Document already exists</str>
+       <str name="partRef">f570a67b-a65b-4782-ad83-84c6ded34d24</str>
+     </lst>
+     <lst>
+       <int name="error-code">400</int>
+       <str name="error-type">org.apache.solr.common.partialerrors.update.DocumentAlreadyExists</str>
+       <str name="error-msg">Attempt to insert (_version_ &lt;= 0 specified explicitly
in document) document failed. Document already exists</str>
+       <str name="partRef">329eefa9-4fc6-43b5-b767-19ae9ac0d304</str>
+     </lst>
+   </arr>
+ </response>
+ }}}
+ 
+ ======== JSON ========
+ 
+ {{{#!json
+ {
+   ... responseHeader etc. ...,
+   "partialerrors":[
+     {"error-code":400,
+      "error-type":"org.apache.solr.common.partialerrors.update.DocumentAlreadyExists",
+      "error-msg":"Attempt to insert (_version_ <= 0 specified explicitly in document)
document failed. Document already exists",
+      "partRef":"997586a1-db59-4c06-88aa-ad2744c60887"
+     },{
+      "error-code":400,
+      "error-type":"org.apache.solr.common.partialerrors.update.DocumentAlreadyExists",
+      "error-msg":"Attempt to insert (_version_ <= 0 specified explicitly in document)
document failed. Document already exists",
+      "partRef":"04376658-c51f-4ae5-b237-576db6567ebe"
+     }
+   ]
+ }
+ }}}
+ 
+ ======== Other types ========
+ 
+ By now you should have gotten the picture and be able to figure our how partial-errors will
be encoded in the response body if you requested response as Ruby or PHP or Python or ...
+ 
- === SolrJ requests ===
+ === SolrJ requests and responses ===
  
+ If you are using a java-based client, you do not need to know that much about the details
of the raw HTTP communication as described above. The SolrJ client framework is there to help
you
+ 
- ==== Construct requests ====
+ ==== Constructing requests ====
  
  {{{#!java
  List<SolrInputDocument> docs = new ArrayList<SolrInputDocuments>();
@@ -124, +248 @@

  UpdateResponse response = server.add(docs, ... your SolrParams ...).get();
  }}}
  
- ==== Catching errors ====
+ ==== Checking responses for partial errors ====
  
  If you send many documents in you request it is possible that the insert/update-operation
will fail for some documents (due to "unique key constraints", "version checking" etc) while
it will not for other documents. Therefore you need to deal with partial errors
  
@@ -132, +256 @@

  UpdateResponse response;
  try {
      response = server.add(docs, ... your SolrParams ...).get();
- } catch (PartialErrors e) {
+ } catch (org.apache.solr.common.partialerrors.PartialErrors e) {
      response = (UpdateResponse)e.getSpecializedResponse();
      DocumentUpdatePartialError err;
      err = response.getPartialError(docA);
@@ -143, +267 @@

  }
  }}}
  
+ The possible exception types (subclasses of DocumentUpdatePartialError) of '''err''' are
java-Exception classes correspondig to the error-types mentioned above, where the package
of the classes correspond to the error-type-namespace and where the name of the classes correspond
to the error-type-name.
- The possible exception types (subclasses of DocumentUpdatePartialError) of '''err''' are
-  * org.apache.solr.common.partialerrors.DocumentDoesNotExist: Indicating that the document
you tried to consistency update does not exist (anymore)
-  * org.apache.solr.common.partialerrors.DocumentAlreadyExists: Indicating that the document
you tried to consistency create already exists (or at least a document with the same uniqueKey
value)
-  * org.apache.solr.common.partialerrors.VersionConflict: Indicating that the document you
tried to consistency update has changed since you fetched it for update (version number has
changed)
-  * org.apache.solr.common.partialerrors.WrongUsage: Indicating that you are using the features
in a wrong way - e.g. if you try to do a consistency insert/update but there is no uniqueKey
defined in your Solr schema or no value for the uniqueKey-field of the document sent in the
request.
  
- If you only send one document in your request there is no reason to deal with PartialErrors,
so for convenience catching per-document errors is possible like this
+ If you only send one document in your request you can catch the java-Exception corresponding
to the error-type directly
+ 
  {{{#!java
  try {
      UpdateResponse response = server.add(... one doc ..., ... your SolrParams ...).get();
- } catch (DocumentDoesNotExist e) {
+ } catch (org.apache.solr.common.partialerrors.update.DocumentDoesNotExist e) {
      ... do something ...
- } catch (DocumentAlreadyExists e) {
+ } catch (org.apache.solr.common.partialerrors.update.DocumentAlreadyExists e) {
      ... do something ...
- } catch (VersionConflict e) {
+ } catch (org.apache.solr.common.partialerrors.update.VersionConflict e) {
      ... do something ...
- } catch (WrongUsage e) {
+ } catch (org.apache.solr.common.partialerrors.WrongUsage e) {
      ... do something ...
  }
  }}}
  
- === Realistic example ===
- 
- TODO Show example of the server side of a Wiki application, using consistency-inserts to
prevent two user from creating a page with the same name (unique key), and using consistency-updates
to prevent users overwriting each others changes to the text of a particular Wiki page.
- 

Mime
View raw message