lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Per Steffensen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3178) Versioning - optimistic locking
Date Sat, 21 Apr 2012 11:52:35 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258838#comment-13258838
] 

Per Steffensen commented on SOLR-3178:
--------------------------------------

bq. {quote}    it is hard to push too many small partly-done features. {quote}

bq. As committers it's the opposite - it's sometimes hard to consume patches that do a lot
of different things.

I could imagine. Im sorry that it is possibly going to be a little bit like that this time.
But I believe I will provide great stuff.

bq. {quote}    It works with multi-document update request, providing fully detailed feedback
to the client about which document-updates failed (and why) and which succeeded. {quote}

bq. That's great! But it's also a separate feature that we've needed for a while (and I think
there has been a JIRA issue open for it for a while).

Yes I agree. See SOLR-3382. The solution I will provide is generic in the way that is supports
giving feedback to the client about "parts" of a request that failed. The client will know
that the "parts" which are not reported to have failed did succeed. Usefull for any kind of
request where parts of it can fail and other parts can succeed. The client provide a "partRef"
for all parts, and partial errors from the server each contain a "partRef" in order to let
the client know which part(s) failed. I only use it for multi-document (or batch-updates as
I see you call it in come places) updates (here each document is a "part"), but I guess it
could also be used for other things like
- Reporting which operations of a multi-operation (e.g. one containing both adds, deletes,
commits, etc) update that failed.
- Reporting if a single update partly failed (e.g. if using replication and the update was
successfull on leader but not on one of more of the replicas)
- Etc.

bq. {quote}    Looking shortly at you patch, I belive, that if two threads in the server JVM
overlaps in a way that is unfortunate enough, then it is possible that data will not be stored
or will be overwritten without an exception being thrown to indicate that to the client. {quote}

bq. I'm confident in the synchronization/concurrency part - it's just reusing what was put
in place for SolrCloud to handle reordering of updates to replicas and is very well tested
(see TestRealTimeGet). But please let us know if you see a problem with it, as that would
mean a problem with SolrCloud today (even without this patch).

Looking closer at your code I would like to withdraw my concern about your solution. I didnt
see that you are doing both "get existing version and check against provided version" and
"store updated version" inside the same block governed by two sync-locks
- A readlock on a ReadWriteLock which is writelocked during deleteByQueries (your "vinfo.lockForUpdate"),
protecting against concurent deletes (by query)
- A lock on the uniqueKey field (or a hash of it in order to be efficient) of the document
(your "synchronized (bucket)"), protecting against concurrent updates and deletes (by id)

So basically I have repeated that locking-pattern in DirectUpdateHandler2 where I chose to
place the "unique key constaint"- and "versioning"- check. Guess my locking can be removed
if it is really necessary to have this kind of locking "this far up in the callstack" (in
DistributedUpdateProcessor.versionAdd). For the purpose of "unique key constraint"- and "versioning"-check
I believe it is only necessary to have this kind of locking around the actual updates in DirectUpdateHandler2
(when leader and not peer_sync) - and the smaller you can make synchronized blocks the better.
But maybe you have reasons (related to SolrCloud) where you need the locking "this far up
in the callstack".
But besides that fact that the locking I did in DirectUpdateHandler2 can be removed I really
believe that the actual checking of "unique key constraint"- and "version"-violation logically
belong in DirectUpdateHandler2, so I guess I will still try to convince you to take that part
of my patch in. 
Besides that, I believe your solution just implements what I call semantics-mode "classic-consistency-hybrid"
on http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics. I believe the "classic"
mode and "consistency" mode are very nice to have in order to provide complete backward compabitility
(classic mode) and a very strict consistency mode where it is not possible to "break the checks"
by providing \_version\_ = 0 (consistency mode).
Finally I believe my solution is a little better with respect to "design" and "separation
of concerns", by isolatating the rules and rule-checks in a separate class instead of just
coding it directly inside DistributedUpdateProcessor which already deals with a lot of other
concerns, but that is another discussion, and should not count as the deciding argument.

bq. {quote}    Feedback to client is even possible if using ConcurrentSolrServer, because
it has been changed to provide a Future from "request" - a Future that will eventually provide
the calling client with the result from the request.{quote}

bq. This sounds like another great feature! It really deserves it's own issue so people can
more easily provide review/feedback and not have it coupled to this issue.

Created the isolated issue - see SOLR-3383

{quote} Some other additions that can be handled later that I see:

    SolrJ support for easier passing of version on a delete, constants for version, etc {quote}

Well basically I have moved VERSION_FIELD = "\_version\_" from VersionInfo to SolrInputDocument.

bq.    Use of "1" as a generic "exists" version (i.e. update document only if it already exists)

Well I started out by wanting that feature (update only if exists, but no exact version check),
but you couldnt see the good reason for that. Since then I realized that you are probably
right and have removed that exact possibility - of course in classic and classic-consistency-hybrid
modes you can still do updates without providing the exact version, but there is error if
the document does not exist.

bq.    If one document in a batch fails, don't automatically abort, and provide info back
about which docs succeeded and which failed (your first point).

That is part of my solution - basically the thing making it necessary to add the ability to
send many errors back to the client - SOLR-3382

bq. That last one in particular needs some good discussion and design work and really deserves
it's own issue.

Agree. Go discuss and comment on SOLR-3382

{quote} Having this current improvement committed in no way blocks any future improvements
you may come up with (including deleting the code quoted above and using whatever method you
have come up with for checking the versions), and it even uses the same API (or a subset of
it) via version and 409. Progress, not perfection! {quote}

Agree, so I guess you could go commit, but that would just make my merge-task a little bigger,
so I hope you will hold your horses a few days more.

bq. Are there any technical objections to this patch?

Except for the things I mention here and there in this response, no. And those are all things
that could be fixed later, going nicely hand in hand with your great philosophy of "progress,
not perfection".

bq. It really piggybacks on all of the SolrCloud work done to pass around and check versions,
so it's really small.

Yes it is small and beautiful

{quote} The API is as follows:

    a) If you provide a version > 0 with an add or a delete, the update will fail with
a 409 unless the provided version matches exactly the latest in the index for the ID.
    b) If you provide a version < 0 with an add or delete, the update will fail with a
409 if the ID exists in the index.
{quote}

You always provide the same error (except the versions in the message). You do not distinguish
between DocumentAlreadyExist, DocumentDoesNotExist and VersionConflict errors as I do. Of
course the client would know that it is a DocumentAlreadyExists if he sent a negative version
number and got 409 (at least as long as 409 is not used for other kind of conflicts that can
also happen when version < 0) and that it is DocumentDoesNotExist or VersionConflict if
he sent a positive version number and got 409 (at least ...). But he really doesnt know if
it is DocumentDoesNotExist or VersionConflict unless he parses the error message, and that
might be important to know because he might want to react differently - e.g.
- On DocumentDoesNotExist: Either do nothing or go do a create (version < 0) of the document
- On VersionConflict: Go refetch the document from Solr, merge in his changes (automatically
or by the help from a user) and try update again.

Of course you also need to parse error-type in my solution to know exactly what the problem
is, but you need no "rules" on client side (e.g. knowing that if you sent version > 0 and
you got version-already-in-store < 0 back, it is DocumentDoesNotExist), and in SolrJ my
solution automatically parses error-type and throw corresponding exceptions.

Ad b) Maybe you dont want an error if you try to delete a document which has already been
deleted, but probably ok

bq. That's the whole scope of this patch, and I believe is compatible with the larger scope
of what Per will provide in the future.

Agree!

                
> Versioning - optimistic locking
> -------------------------------
>
>                 Key: SOLR-3178
>                 URL: https://issues.apache.org/jira/browse/SOLR-3178
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 3.5
>         Environment: All
>            Reporter: Per Steffensen
>            Assignee: Per Steffensen
>              Labels: RDBMS, insert, locking, nosql, optimistic, uniqueKey, update, versioning
>             Fix For: 4.0
>
>         Attachments: SOLR-3178.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> In order increase the ability of Solr to be used as a NoSql database (lots of concurrent
inserts, updates, deletes and queries in the entire lifetime of the index) instead of just
a search index (first: everything indexed (in one thread), after: only queries), I would like
Solr to support versioning to be used for optimistic locking.
> When my intent (see SOLR-3173) is to update an existing document, I will need to provide
a version-number equal to "the version number I got when I fetched the existing document for
update" plus one. If this provided version-number does not correspond to "the newest version-number
of that document at the time of update" plus one, I will get a VersionConflict error. If it
does correspond the document will be updated with the new one, so that "the newest version-number
of that document" is NOW one higher than before the update. Correct but efficient concurrency
handling.
> When my intent (see SOLR-3173) is to insert a new document, the version number provided
will not be used - instead a version-number 0 will be used. According to SOLR-3173 insert
will only succeed if a document with the same value on uniqueKey-field does not already exist.
> In general when talking about different versions of "the same document", of course we
need to be able to identify when a document "is the same" - that, per definition, is when
the values of the uniqueKey-fields are equal. 
> The functionality provided by this issue is only really meaningfull when you run with
"updateLog" activated.
> This issue might be solved more or less at the same time as SOLR-3173, and only one single
SVN patch might be given to cover both issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message