lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Per Steffensen (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (SOLR-3178) Versioning - optimistic locking
Date Fri, 27 Apr 2012 15:14:49 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263514#comment-13263514
] 

Per Steffensen edited comment on SOLR-3178 at 4/27/12 3:13 PM:
---------------------------------------------------------------

First of all. Thanks a lot for getting back to me, and taking the time to try to understand
such a big patch covering several jira issues - Im still sorry about that.

{quote} I think I agree it would be nice to have more of the functionality consolidated to
the update handler and not do quite so much "up the callstack"... of course the downside to
this would be the context that the update processors have (namely they know when a request,
which can contain multiple updates, starts and finishes). {quote}

I agree and understand about the different context, but failing/succeeding with PartialError
(VersionConflict, DocumentAlreadyExists, DocumentDoesNotExist or WrongUsage) is a per-document-to-be-updated
thing. Therefore I believe it does not matter in this case, or acutally, that might just be
the argument that it belongs in UpdateHandler. Certainly it can be done in UpdateHandler.

{quote} Regardless - we are where we are right now... and doing the version check in the update
handler presents some problems. For cloud, we want to start sending to replicas at the same
time as indexing locally. {quote}

Im not sure I understand. As I read the code, we are not doing that as it is today. In DistributedUpdateProcessor.processAdd
you first do the local add (as the leader) in versionAdd AND THEN, if you succeed, forwards
to all replica (in the if (nodes <> null) block). I believe it is a good strategy to
first do update on leader and then, if success on leader, do update on replica concurrently
(and asynchronously). So stick with that!

So doing version-check etc (when leader) in DirectUpdateHandler2 does not represent a problem.
Errors occuring as a result of version-check etc should just (as any other error) result in
update not forwarded to replica.

One problem in my patch is that DirectUpdateHandler2 does not notify (by throwing error or
making sure dropCmd is true) DistributedUpdateProcessor to not send to replica in case of
PartialError (VersionConflict, DocumentAlreadyExists, DOcumentDoesNotExists or WrongUsage).
It will in the next upcomming patch. 

{quote} We also want to check the version before sending to replicas since it would be easy
to get into a scenario where a conditional update fails on some replicas and succeeds on others
(due to the fact that updates can easily be seen by different replicas in a different order).
{quote}

Agree about check version on leader before sending to replica. But that can equally well be
done in "DistributedUpdateProcessor.versionAdd" (inside the locks) as you do, as it can be
done in "DirectUpdateHandler2.addDoc" as I do. So with respect to that it doesnt matter if
you take your patch or mine. Of course when I change my patch so that version-check etc errors
in DirectUpdateHandler2 actually ensures that the update is not sent to replica.

The idea is that version-checks and uniqueKey-already-exists-checks etc, leading to PartialError,
are only performed on leader, so replica will not fail with this kind of errors - and they
shouldnt. Please note "cmd.isLeaderLogic()" in DirectUpdateHandler2.addDoc in the line
{code}
boolean getAndCheckAgainstExisting = semanticsMode.needToGetAndCheckAgainstExistingDocument(cmd)
&& cmd.isLeaderLogic();
{code}
Replica are supposed to be "dumb" in the way that they just index (without any checking) what
was successfully indexed on the leader (and therefore forwarded to replica in the first place),
and support getting updates in different orders by not indexing updates if they are "older"
than an update they have already indexed. 

{quote} So I think the best way forward is to commit the patch that does the version check
in the distributed update processor, and then build off of that (for instance your more nuanced
error messages and the set/get requestVersion on the update command). {quote}

I have to say that I do not agree. Believe I have argued why the problems you mention with
my patch/approach is not true (when I make the small fix mentioned). Sure your patch is a
small step forward (progress, not perfection), but building on top of that will put me in
a situation where I have to make changes to my patch in order to get all the extra bennefits
it provide (much more progresss, not perfection :-) ). Of course you can commit your patch,
but I still believe the code in that patch should be deleted when adding my patch, and I still
hope that will be the end result.

I will provide patch soon with the following changes
* PartialError in DirectUpdateHandler2 on leader will make DistributedUpdateProcessor not
forward to replica (as with any other error on leader)
* Error propagation will work when client contact server that is not the leader of the document
to be updated, and therefore forwards to leader, and therefore also have to propagate errors
from that leader back to client. His has always worked for single errors, where you just abort
the entire update on the first error, leaving no information to the client about what failed
and what did not, but doesnt work now that you can have partial errors where some documents
fail and other succeeds (on different servers) and you want to provide a detailed picture
about it back to the client.
* Cleanup of a few things in the patch that should never have been included (survivals from
early attempts rolled back on my side)
                
      was (Author: steff1193):
    First of all. Thanks a lot for getting back to me, and taking the time to try to understand
such a big patch covering several jira issues - Im still sorry about that.

{quote} I think I agree it would be nice to have more of the functionality consolidated to
the update handler and not do quite so much "up the callstack"... of course the downside to
this would be the context that the update processors have (namely they know when a request,
which can contain multiple updates, starts and finishes). {quote}

I agree and understand about the different context, but failing/succeeding with PartialError
(VersionConflict, DocumentAlreadyExists, DocumentDoesNotExist or WrongUsage) is a per-document-to-be-updated
thing. Therefore I believe it does not matter in this case, or acutally, that might just be
the argument that it belongs in UpdateHandler. Certainly it can be done in UpdateHandler.

{quote} Regardless - we are where we are right now... and doing the version check in the update
handler presents some problems. For cloud, we want to start sending to replicas at the same
time as indexing locally. {quote}

Im not sure I understand. As I read the code, we are not doing that as it is today. In DistributedUpdateProcessor.processAdd
you first do the local add (as the leader) in versionAdd AND THEN, if you succeed, forwards
to all replica (in the if (nodes != null) block). I believe it is a good strategy to first
do update on leader and then, if success on leader, do update on replica concurrently (and
asynchronously). So stick with that!

So doing version-check etc (when leader) in DirectUpdateHandler2 does not represent a problem.
Errors occuring as a result of version-check etc should just (as any other error) result in
update not forwarded to replica.

One problem in my patch is that DirectUpdateHandler2 does not notify (by throwing error or
making sure dropCmd is true) DistributedUpdateProcessor to not send to replica in case of
PartialError (VersionConflict, DocumentAlreadyExists, DOcumentDoesNotExists or WrongUsage).
It will in the next upcomming patch. 

{quote} We also want to check the version before sending to replicas since it would be easy
to get into a scenario where a conditional update fails on some replicas and succeeds on others
(due to the fact that updates can easily be seen by different replicas in a different order).
{quote}

Agree about check version on leader before sending to replica. But that can equally well be
done in "DistributedUpdateProcessor.versionAdd" (inside the locks) as you do, as it can be
done in "DirectUpdateHandler2.addDoc" as I do. So with respect to that it doesnt matter if
you take your patch or mine. Of course when I change my patch so that version-check etc errors
in DirectUpdateHandler2 actually ensures that the update is not sent to replica.

The idea is that version-checks and uniqueKey-already-exists-checks etc, leading to PartialError,
are only performed on leader, so replica will not fail with this kind of errors - and they
shouldnt. Please note "cmd.isLeaderLogic()" in DirectUpdateHandler2.addDoc in the line
{code}
boolean getAndCheckAgainstExisting = semanticsMode.needToGetAndCheckAgainstExistingDocument(cmd)
&& cmd.isLeaderLogic();
{code}
Replica are supposed to be "dumb" in the way that they just index (without any checking) what
was successfully indexed on the leader (and therefore forwarded to replica in the first place),
and support getting updates in different orders by not indexing updates if they are "older"
than an update they have already indexed. 

{quote} So I think the best way forward is to commit the patch that does the version check
in the distributed update processor, and then build off of that (for instance your more nuanced
error messages and the set/get requestVersion on the update command). {quote}

I have to say that I do not agree. Believe I have argued why the problems you mention with
my patch/approach is not true (when I make the small fix mentioned). Sure your patch is a
small step forward (progress, not perfection), but building on top of that will put me in
a situation where I have to make changes to my patch in order to get all the extra bennefits
it provide (much more progresss, not perfection :-) ). Of course you can commit your patch,
but I still believe the code in that patch should be deleted when adding my patch, and I still
hope that will be the end result.

I will provide patch soon with the following changes
* PartialError in DirectUpdateHandler2 on leader will make DistributedUpdateProcessor not
forward to replica (as with any other error on leader)
* Error propagation will work when client contact server that is not the leader of the document
to be updated, and therefore forwards to leader, and therefore also have to propagate errors
from that leader back to client. His has always worked for single errors, where you just abort
the entire update on the first error, leaving no information to the client about what failed
and what did not, but doesnt work now that you can have partial errors where some documents
fail and other succeeds (on different servers) and you want to provide a detailed picture
about it back to the client.
* Cleanup of a few things in the patch that should never have been included (survivals from
early attempts rolled back on my side)
                  
> Versioning - optimistic locking
> -------------------------------
>
>                 Key: SOLR-3178
>                 URL: https://issues.apache.org/jira/browse/SOLR-3178
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 3.5
>         Environment: All
>            Reporter: Per Steffensen
>            Assignee: Per Steffensen
>              Labels: RDBMS, insert, locking, nosql, optimistic, uniqueKey, update, versioning
>             Fix For: 4.0
>
>         Attachments: SOLR-3178.patch, SOLR_3173_3178_3382_plus.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> In order increase the ability of Solr to be used as a NoSql database (lots of concurrent
inserts, updates, deletes and queries in the entire lifetime of the index) instead of just
a search index (first: everything indexed (in one thread), after: only queries), I would like
Solr to support versioning to be used for optimistic locking.
> When my intent (see SOLR-3173) is to update an existing document, I will need to provide
a version-number equal to "the version number I got when I fetched the existing document for
update" plus one. If this provided version-number does not correspond to "the newest version-number
of that document at the time of update" plus one, I will get a VersionConflict error. If it
does correspond the document will be updated with the new one, so that "the newest version-number
of that document" is NOW one higher than before the update. Correct but efficient concurrency
handling.
> When my intent (see SOLR-3173) is to insert a new document, the version number provided
will not be used - instead a version-number 0 will be used. According to SOLR-3173 insert
will only succeed if a document with the same value on uniqueKey-field does not already exist.
> In general when talking about different versions of "the same document", of course we
need to be able to identify when a document "is the same" - that, per definition, is when
the values of the uniqueKey-fields are equal. 
> The functionality provided by this issue is only really meaningfull when you run with
"updateLog" activated.
> This issue might be solved more or less at the same time as SOLR-3173, and only one single
SVN patch might be given to cover both issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message