lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "Per Steffensen/Update semantics" by Per Steffensen
Date Mon, 05 Mar 2012 13:55:28 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "Per Steffensen/Update semantics" page has been changed by Per Steffensen:
http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics?action=diff&rev1=2&rev2=3

Comment:
Motivation and implementation

  #format wiki
  #language en
  = Update semantics =
+ 
+ Please note that the features describes here has not been committed yet.
+ 
  == Motivation ==
- Important feature when using Solr as a NoSQL data store and not just a search index.
+ 
+ Solr is missing advanved features when using it as a NoSQL database and not just a search
index. When talking about using it as a NoSQL database instead of just a search index, it
is primarily thinking about using Solr in a way where you have many threads inserting, updating
and deleting documents through the entire lifetime of the core/shard/lucene-index - also while
the index is used for searching. Using Solr as a search index, is more about first indexing
your entire world into Solr, and afterwards solely using it for quering.
+ 
+ Some of the features missing are:
+  * Insert semantics as we know it from RDBMSs: Do not insert a document if it already exists.
It "exists" if a document with the same value in uniqueKey-field already exists. Very much
like the following SQL does NOT insert (instead it faild with a UniqueKeyConstraint error)
if there is a unique key constraint on column "id" and a row with id=1234 already exists:
"INSERT INTO docs (id, column2, column3,...) VALUES (1234, value2, value3,...)"
+  * Update semantics as we know if from RDBMSs: Do not update if the document does not already
exist (it might have existed, but have been deleted). Very much like the following SQL does
NOT insert if a row with id=1234 does not already exist: "UPDATE docs SET column2=value2,
column3=value3, ... WHERE id=1234"
+  * Update semantics with version control (for optimistic locking) as we know it from RDBMSs:
Do not update if the document does not already exist or if it has been changed since it was
loaded for update by the client doing the update. Very much like the following SQL does NOT
update a row if with id=1234 if version=5678 is not true: "UPDATE docs SET column2=value2,
column3=value3, ... WHERE id=1234 AND version=5678". This fact is used by popular O/R-mappers
(like Hibernate) to provide a VersionConflict error if the object (row/document) you loaded
for update, has changed since you loaded it when you try to store your updated version.
+ 
+ == Implementation ==
+ 
+ The above features could have been implemented by providing you with different ways of "updating"
documents in Solr, than by using the "update" operation. But instead the "update" operation
is still the only operation you have for inserting/updating documents in Solr, but now you
have a way of controlling the exact semantics you want Solr to behind the scenes when you
make an "update" request. You can control is by request parameters or by attributes on your
JSON/XML content - just as you can with e.g. the "overwrite" and "commitWithin" flags.
+ 
+ If you think about it, the "overwrite" flag, that has been around for a while, is actually
already a way for you to controll the inner semantics of Solr when processing you "update"
operation. So basically the implementation of the features mentioned on this page, replaces
the "overwrite" flag with a "semantics" flag. The "semantics" flag can take the following
values:
+  * classic-update: This is the default so if you dont want you never have to explicitly
set "semantics=classic-update". You can if you want, though. This makes Solr provide the same
semantics in "update" operations, as it has always done when "overwrite" was not set to "false"
(overwrite=true is default)
+  * classic-update-dont-overwrite: The semantics you get with "semantics=classic-update-dont-overwrite"
is the same as what you have always gotten with "overwrite" set to false. Since "overwrite"
flag is still possible for backward compatibility, basically you can get this semantics in
two ways - either by setting "semantics=classic-update-dont-overwrite" or by setting "overwrite=false"
(and not setting "semantics")
+  * db-insert: Setting "semantics=db-insert" provides a new semantics behind "update" operations.
The document sent in the "update" request is added to the shard/core/index if and only if
the document does not already exist. Remember that a document "already exists" if a document
with the same value in uniqueKey-field is already in the core at the time of the "update"
operation. If the document does exist this semantics makes the "update" operation result in
a "DocumentAlreadyExists" error. This feature is of course "thread-safe" in the way that,
if the core does not contain a document with uniqueKey-field-value "cool_features", and 10
client-threads "at the same time" tries to do an "update" operation with "semantics=db-insert"
and a document with uniqueKey-field-value "cool_features", only one thread will succeed -
9 threads will end up having a "DocumentAlreadyExists" error.
+  * db-update: Setting "semantics=db-update" provides another new semantics behind "update"
operation. The document sent in the "update" request is added (and the old corresponding document
deleted) if and only if the document already exists. If the document does not exist (it might
have existed, but have been deleted by the time of the "update" operation) you will get a
"DocumentDoesNotExist" error.
+  * db-update-version-control: Setting "semantics=db-update-version-control" provides yet
another semantics behind "update" operation. The document sent in the "update" request will
need to contain a value for the "_version_"-field, corresponding to the value retrieved when
the client, now doing the update, retrieved the document from Solr for update. Just as with
"semantics=db-update" you get a "DocumentDoesNotExist" error if the document does not already
exist. But now it is also possible to get a "VersionConflict" error, if the value in "_version_"-field
does not match the value of the "_version_"-field of the document in Solr at the time of the
"update" operation. This feature is of course also "thread-safe" in the way that, if the core
contains a document with uniqueKey-field-value "versioning_rocks" and "_version_"-field-value
"5678", and 10 client-threads "at the same time" tries to do an "update" operation with "semantics=db-update-version-control"
and a document with uniqueKey-field-value "versioning_rocks" and "_version_"-field-value "5678",
only one thread will succeed - 9 threads will end up having a "VersionConflict" error.
  
  == Using it ==
+ 
  === Requirements ===
   * uniqueKey field
   * updateLog

Mime
View raw message