lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "Per Steffensen/Update semantics" by Per Steffensen
Date Fri, 09 Mar 2012 09:54:00 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "Per Steffensen/Update semantics" page has been changed by Per Steffensen:
http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics?action=diff&rev1=5&rev2=6

  
  <!> [[Solr4.0]] (I hope)
  
- Please note that the features describes here has not been committed yet.
+ Please note that the features described here have not yet been committed.
  
  == Motivation ==
  
- Solr is missing advanved features when using it as a NoSQL database and not just a search
index. When talking about using it as a NoSQL database instead of just a search index, it
is primarily thinking about using Solr in a way where you have many threads inserting, updating
and deleting documents through the entire lifetime of the core/shard/lucene-index - also while
the index is used for searching. Using Solr as a search index, is more about first indexing
your entire world into Solr, and afterwards solely using it for quering.
+ Solr is missing advanced features when using it as a NoSQL database and not just as a search
index. When talking about using it as a NoSQL database instead of just as a search index,
I primarily mean cases where you use Solr in a way where you have (potentially many) threads
concurrently inserting, updating and deleting documents. Using Solr as a search index, is
more about first indexing your entire world into Solr using one thread (or many threads, but
without the possibility that they mess with data indexed by another thread), and afterwards
solely using it for quering.
  
  Some of the features missing are:
-  * Insert semantics as we know it from RDBMSs: Do not insert a document if it already exists.
It "exists" if a document with the same value in uniqueKey-field already exists. Very much
like the following SQL does NOT insert (instead it faild with a UniqueKeyConstraint error)
if there is a unique key constraint on column "id" and a row with id=1234 already exists:
"INSERT INTO docs (id, column2, column3,...) VALUES (1234, value2, value3,...)"
+  * Insert semantics as we know it from RDBMSs: Do not insert a document if it already exists
in Solr. A document is defined to exist in Solr, if a document with the same value in uniqueKey-field
already exists in Solr. Very much like the following SQL does NOT insert (instead it fails
with a UniqueKeyConstraint error) if there is a unique key constraint on column "id" and a
row with id=1234 already exists: "INSERT INTO docs (id, column2, column3,...) VALUES (1234,
value2, value3,...)"
-  * Update semantics as we know if from RDBMSs: Do not update if the document does not already
exist (it might have existed, but have been deleted). Very much like the following SQL does
NOT insert if a row with id=1234 does not already exist: "UPDATE docs SET column2=value2,
column3=value3, ... WHERE id=1234"
+  * Update semantics as we know if from RDBMSs: Do not update if the document does not already
exist (it might have existed, but have been deleted). Very much like the following SQL does
NOT update anything if a row with id=1234 does not already exist: "UPDATE docs SET column2=value2,
column3=value3, ... WHERE id=1234"
-  * Update semantics with version control (for optimistic locking) as we know it from RDBMSs:
Do not update if the document does not already exist or if it has been changed since it was
loaded for update by the client doing the update. Very much like the following SQL does NOT
update a row if with id=1234 if version=5678 is not true: "UPDATE docs SET column2=value2,
column3=value3, ... WHERE id=1234 AND version=5678". This fact is used by popular O/R-mappers
(like Hibernate) to provide a VersionConflict error if the object (row/document) you loaded
for update, has changed since you loaded it when you try to store your updated version.
+  * Update semantics with version control (for optimistic locking) as we know it from RDBMSs:
Do not update if the document does not already exist or if it has been changed since it was
loaded for update by the client doing the update. Very much like the following SQL does NOT
update a row with id=1234 if the version of the document in Solr at the time of update is
not (any longer) 5678: "UPDATE docs SET column2=value2, column3=value3, ... WHERE id=1234
AND version=5678". This feature is used by popular O/R-mappers (like Hibernate) to provide
a VersionConflict error if the object (row/document) you loaded for update has changed since
you loaded it when you try to store your updated version.
  
  == Implementation ==
  
- The above features could have been implemented by providing you with different ways of "updating"
documents in Solr, than by using the "update" operation. But instead the "update" operation
is still the only operation you have for inserting/updating documents in Solr, but now you
have a way of controlling the exact semantics you want Solr to behind the scenes when you
make an "update" request. You can control is by request parameters or by attributes on your
JSON/XML content - just as you can with e.g. the "overwrite" and "commitWithin" flags.
+ The above features could have been implemented by providing you with different ways of "updating"
documents in Solr, than by using the "update-add" operation. But instead the "update-add"
operation is still the only operation you have for inserting/updating documents in Solr, but
now you have a way of controlling the exact semantics you want Solr to do behind the scenes,
when you make an "update-add" request. You can control is by request parameters or by attributes
on your add content - just as you can with e.g. the "overwrite" and "commitWithin" flags.
  
- If you think about it, the "overwrite" flag, that has been around for a while, is actually
already a way for you to controll the inner semantics of Solr when processing you "update"
operation. So basically the implementation of the features mentioned on this page, replaces
the "overwrite" flag with a "semantics" flag. The "semantics" flag can take the following
values:
+ If you think about it, the "overwrite" flag, that has been around for a while, is actually
already a way for you to control the inner semantics of Solr when processing your "update-add"
operation. So basically the implementation of the features mentioned on this page, replaces
the "overwrite" flag with a "semantics" flag. The "semantics" flag can take the following
values:
-  * classic-update: This is the default so if you dont want you never have to explicitly
set "semantics=classic-update". You can if you want, though. This makes Solr provide the same
semantics in "update" operations, as it has always done when "overwrite" was not set to "false"
(overwrite=true is default)
+  * classic-update: This is the default, so if you dont want to, you never have to explicitly
set "semantics=classic-update". You can if you want, though. Setting "semantics=classic-update"
makes Solr provide the same semantics in "update-add" operations, as it has always done when
"overwrite" was not set to "false" (overwrite=true is default)
   * classic-update-dont-overwrite: The semantics you get with "semantics=classic-update-dont-overwrite"
is the same as what you have always gotten with "overwrite" set to false. Since "overwrite"
flag is still possible for backward compatibility, basically you can get this semantics in
two ways - either by setting "semantics=classic-update-dont-overwrite" or by setting "overwrite=false"
(and not setting "semantics")
-  * db-insert: Setting "semantics=db-insert" provides a new semantics behind "update" operations.
The document sent in the "update" request is added to the shard/core/index if and only if
the document does not already exist. Remember that a document "already exists" if a document
with the same value in uniqueKey-field is already in the core at the time of the "update"
operation. If the document does exist this semantics makes the "update" operation result in
a "DocumentAlreadyExists" error. This feature is of course "thread-safe" in the way that,
if the core does not contain a document with uniqueKey-field-value "cool_features", and 10
client-threads "at the same time" tries to do an "update" operation with "semantics=db-insert"
and a document with uniqueKey-field-value "cool_features", only one thread will succeed -
9 threads will end up having a "DocumentAlreadyExists" error.
+  * db-insert: Setting "semantics=db-insert" provides a new type of semantics behind "update-add"
operations. The document sent in the "update-add" request is added to the shard/core/index
if and only if the document does not already exist. Remember that a document "already exists"
if a document with the same value in uniqueKey-field is already in the core at the time of
the "update-add" operation. If the document does exist this semantics makes the "update-add"
operation result in a "DocumentAlreadyExists" error. This feature is of course "thread-safe"
in the way that, if the core does not contain a document with uniqueKey-field-value "cool_features",
and 10 client-threads "at the same time" tries to do an "update-add" operation with "semantics=db-insert"
and a document with uniqueKey-field-value "cool_features", only one thread will succeed -
9 threads will end up having a "DocumentAlreadyExists" error.
+  * db-update: Setting "semantics=db-update" provides another new type of semantics behind
"update" operation. The document sent in the "update-add" request is added (and the old corresponding
document deleted) if and only if the document already exists. If the document does not exist
(it might have existed, but have been deleted by the time of the "update-add" operation) you
will get a "DocumentDoesNotExist" error. If your schema constains a "_version_" field and
you put a value for the "_version_" field in the document you send for update, you will have
version control (for optimistic locking) does as well. The "update-add" operation will result
in a "VersionConflict" error, if the value of "_version_"-field in the document sent for update
does not match the value of the "_version_"-field of the document in Solr at the time of the
"update-add" operation. This feature is of course also "thread-safe" in the way that, if the
core contains a document with uniqueKey-field-value "versioning_rocks" and "_version_"-field-value
"5678", and 10 client-threads "at the same time" tries to do an "update" operation with "semantics=db-update"
and a document with uniqueKey-field-value "versioning_rocks" and "_version_"-field-value "5678",
only one thread will succeed - 9 threads will end up having a "VersionConflict" error.
-  * db-update: Setting "semantics=db-update" provides another new semantics behind "update"
operation. The document sent in the "update" request is added (and the old corresponding document
deleted) if and only if the document already exists. If the document does not exist (it might
have existed, but have been deleted by the time of the "update" operation) you will get a
"DocumentDoesNotExist" error.
-  * db-update-version-control: Setting "semantics=db-update-version-control" provides yet
another semantics behind "update" operation. The document sent in the "update" request will
need to contain a value for the "_version_"-field, corresponding to the value retrieved when
the client, now doing the update, retrieved the document from Solr for update. Just as with
"semantics=db-update" you get a "DocumentDoesNotExist" error if the document does not already
exist. But now it is also possible to get a "VersionConflict" error, if the value in "_version_"-field
does not match the value of the "_version_"-field of the document in Solr at the time of the
"update" operation. This feature is of course also "thread-safe" in the way that, if the core
contains a document with uniqueKey-field-value "versioning_rocks" and "_version_"-field-value
"5678", and 10 client-threads "at the same time" tries to do an "update" operation with "semantics=db-update-version-control"
and a document with uniqueKey-field-value "versioning_rocks" and "_version_"-field-value "5678",
only one thread will succeed - 9 threads will end up having a "VersionConflict" error.
  
  == Using it ==
  
@@ -36, +35 @@

  
  === Requirements ===
  
- To use "semantics=db-insert", "semantics=db-update" or "semantics=db-update-version-control"
there is a few requirements to you Solr schema and configuration.
+ To use "semantics=db-insert" or "semantics=db-update" there are a few requirements to your
Solr schema and configuration.
  
-  * You need to have a uniqueKey-field in your schema. E.g. in schema.xml:
+  * You need to have a uniqueKey-field in your schema. In schema.xml e.g.:
  {{{#!xml
   <field name="id" type="string" indexed="true" stored="true" required="true"/>
   <uniqueKey>id</uniqueKey>
  }}}
-  * You need to have a "_version_"-field in your schema. In schema.xml:
+  * If you want version control, you need to have a "_version_"-field in your schema. In
schema.xml:
  {{{#!xml
   <field name="_version_" type="long" indexed="true" stored="true" />
  }}}
   * You need to use DirectUpdateHandler2 as your update-handler with updateLog enabled. In
solrconfig.xml:
  {{{#!xml
   <updateHandler class="solr.DirectUpdateHandler2">
-   <updateLog enable="${enable.update.log:false}">
+   <updateLog>
     <str name="dir">${solr.data.dir:}</str> 
    </updateLog>
   </updateHandler>
@@ -58, +57 @@

  
  === HTTP requests ===
  
- Add &semantics=XXXX, where XXXX is "semantics=db-insert", "semantics=db-update" or "semantics=db-update-version-control",
to you HTTP update requests.
+ Add &semantics=XXXX, where XXXX is "semantics=db-insert" or "semantics=db-update" (or
one of the classic semantics), to you HTTP update requests.
  
  ==== JSON ====
  

Mime
View raw message