ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Александр Меньшиков <sharple...@gmail.com>
Subject Re: IGNITE-1948 ClusterTopologyCheckedException can return null for retryReadyFuture()
Date Mon, 20 Feb 2017 13:42:38 GMT
Alexey,

I'm ready to make some conclusions. You can see result immediately here:
PR:          https://github.com/apache/ignite/pull/1537/files
CI Tests:
http://ci.ignite.apache.org/project.html?projectId=IgniteTests&tab=projectOverview&branch_IgniteTests=pull/1537/head

I fixed only "ClusterTopologyCheckedException", and didn't touched
"ClusterTopologyException". "ClusterTopologyException" has a same problem
like "ClusterTopologyCheckedException", but even now changes is huge (80
files). So if you think fixing of "ClusterTopologyException" is necessary,
you could add different issue in JIRA (or I can do that). And I will fix it
in future. I can't implement your idea about using Runtime exception,
because right now a lot of logic tied to the fact that this is
"IgniteCheckedException". I real tried to mix it but I couldn't make it
work.

So before my changes we have 3 affected classes:
1. "ClusterTopologyCheckedException".
2. "ClusterTopologyServerNotFoundException".
3. "ClusterGroupEmptyCheckedException".

Now we there are 5 affected classes:

"ClusterTopologyCheckedException" split into 2 classes:
1. "ClusterTopologyCheckedException" (with future).
2. "ClusterTopologyLocalException" (without future, and it's parent for
"ClusterTopologyCheckedException").

"ClusterTopologyServerNotFoundException" rename to
3. "ClusterTopologyServerNotFoundLocalException" (For consistency of names,
I didn't find any cases where "ClusterTopologyServerNotFoundException" need
his future).

"ClusterGroupEmptyCheckedException" split into 2 classes:
4. "ClusterGroupEmptyCheckedException".
5. "ClusterGroupEmptyLocalException".

Also in code "ready future" is using for waiting, and don't using for get
real value (just look at code, all values just ignored). In fact "ready
future" has type "IgniteFuture<?>". It suggests that result doesn't matter.

2017-02-10 11:06 GMT+03:00 Alexey Goncharuk <alexey.goncharuk@gmail.com>:

> Alexander,
>
> I do not see why you would need to overwrite the cause field.
>
> What I meant in the previous email is that instead of setting a retry
> ready future in the checked exception, you would set a correct affinity
> topology version here. Then, during a construction of CacheException
> (unchecked, which is guaranteed to be thrown locally and will not be
> serialized), you would create the retry ready future based on the topology
> version set.
>
> Hope this helps,
> AG
>
> 2017-02-07 17:18 GMT+03:00 Александр Меньшиков <sharplermc@gmail.com>:
>
>> Alexey, I ran into some difficulties.
>>
>> Look at the method: GridNearTxFinishFuture.CheckBa
>> ckupMiniFuture#onDhtFinishResponse(GridDhtTxFinishResponse res)
>>
>>             *Throwable err* = res.checkCommittedError();
>>
>>             if (*err* != null) {
>>                 if (*err* instanceof IgniteCheckedException) {
>>                     *ClusterTopologyCheckedException cause* =
>>                         ((IgniteCheckedException)*err*).
>> *getCause(ClusterTopologyCheckedException.class)*;
>>
>>                     if (*cause* != null)
>>                         *cause.retryReadyFuture(*cctx.ne
>> xtAffinityReadyFuture(tx.topologyVersion()));
>> *                                   //^_____Here update the readyFut in
>> some first "cause". *
>>                 }
>>
>>                 onDone(*err*);
>>             }
>>
>>
>> So I can't rewrite "cause" field in exception without reflection. It
>> means if I try to use two exception: one with "readyFut" and second without
>> "readyFut", I see no way to do it in this place.
>>
>> Perhaps I misunderstood your original idea. Or maybe this code some kind
>> of a crutch and should be removed. What do you think?
>>
>> 2017-01-30 16:54 GMT+03:00 Alexey Goncharuk <alexey.goncharuk@gmail.com>:
>>
>>> In the example above there is no point of setting the retry future in the
>>> exception because it will be serialized and sent to a remote node.
>>>
>>> I see the only possible way to ensure this: make this be verified at
>>> compile time. This looks like a big change, but the draft may look like
>>> so:
>>> 1) Introduce new exception type, like CacheTopology*Checked*Exception
>>> which
>>> *must* take the minimum topology version to wait for
>>> 2) In all cases when a cache operation fails because of topology change,
>>> provide the appropriate exception
>>> 3) When CacheTopologyException is being constructed, create a
>>> corresponding
>>> local future based on wait version and throw the exception.
>>>
>>> Even though this still does not give us 100% guarantee that we did not
>>> miss
>>> anything, it should cover a lot more cases than now.
>>>
>>> 2017-01-30 14:20 GMT+03:00 Александр Меньшиков <sharplermc@gmail.com>:
>>>
>>> > Alexey,
>>> >
>>> > For GridCacheIoManager#sendNoRetry it's not necessary because exception
>>> > will be ignored (or will cast to String). Perhaps my message was
>>> unclear.
>>> > I try to say that three methods can throw
>>> "ClusterTopologyCheckedException"
>>> > without any problem.
>>> >
>>> > But you are right, in some methods I can't set "retryFuture". We must
>>> > ensure that exception doesn't escape away. The best option is to
>>> ignore it
>>> > (or cast to String).
>>> >
>>> > But any way, look here:
>>> >
>>> > IgniteTxHandler#sendReply(UUID nodeId, GridDhtTxFinishRequest req,
>>> boolean
>>> > committed, GridCacheVersion nearTxId)
>>> >
>>> > Inside you can found next lines:
>>> > __________________________
>>> > ClusterTopologyCheckedException *cause* =
>>> >     new ClusterTopologyCheckedException("Primary node left grid.");
>>> >
>>> > res.checkCommittedError(new IgniteTxRollbackCheckedException("Failed
>>> to
>>> > commit transaction " +
>>> >     "(transaction has been rolled back on backup node): " +
>>> req.version(),
>>> > *cause*));
>>> > __________________________
>>> >
>>> > How do you unsure that *"cause"* can't come to
>>> GridCacheUtils#retryTopologySafe(final
>>>
>>> > Callable<S> c) ?
>>> >
>>> >
>>> > Perhaps I don't know some rules which you use to maintain it now?
>>> >
>>> >
>>> > 2017-01-30 12:24 GMT+03:00 Alexey Goncharuk <
>>> alexey.goncharuk@gmail.com>:
>>> >
>>> >> Alexander,
>>> >>
>>> >> Do you have a use-case in mind which results in
>>> IgniteTopologyException
>>> >> thrown from public API with retryFuture unset?
>>> >>
>>> >> retryFuture makes sense only for certain cache operations and may be
>>> set
>>> >> only in certain places in the code where we already know what to wait
>>> for.
>>> >> I do not see how you can initialize this future, for example, in
>>> >>  GridCacheIoManager#sendNoRetry(ClusterNode node, GridCacheMessage
>>> msg,
>>> >> byte
>>> >> plc)
>>> >>
>>> >> --AG
>>> >>
>>> >> 2017-01-27 15:45 GMT+03:00 Александр Меньшиков <sharplermc@gmail.com
>>> >:
>>> >>
>>> >> > I found a lot of using "ClusterTopologyCheckedException"
>>> exception. In
>>> >> > most
>>> >> > use-case these exceptions were just ignored. It's easier to see
if
>>> >> > issue-4612 will be applied (I'm waiting for code review by my team
>>> >> leader)
>>> >> > where I renamed all unused exceptions in the "ignored".
>>> >> >
>>> >> > It means that in some case "retryReadyFuture" can left unset. But
>>> there
>>> >> are
>>> >> > code where exception is being getting from "get()" method in some
>>> >> "future"
>>> >> > class and don't ignored (exception is sending to method
>>> >> > "GridFutureAdapter#onDone()" for example).
>>> >> >
>>> >> > So I decided not to do a deep global analysis of code and make
some
>>> >> simple
>>> >> > rules.
>>> >> > 1. If some method "X" throws ClusterTopologyCheckedException and
>>> >> ignores
>>> >> > it
>>> >> > (or converts to String, there are some variants to lose exception
>>> like
>>> >> > object) in all methods that call the method "X", then we can don't
>>> set
>>> >> > "retryReadyFuture" for optimization. But this should be clarified
in
>>> >> > javadoc.
>>> >> > 2. But if exception escapes at least once like object: we must
set
>>> >> > "retryReadyFuture" in that method.
>>> >> >
>>> >> > A few times when call tree was simple, I went deeper.
>>> >> >
>>> >> > So I found only three methods which can throw
>>> >> > "ClusterTopologyCheckedException" without setting
>>> "retryReadyFuture":
>>> >> >
>>> >> > 1. IgfsFragmentizerManager#sendWithRetries(UUID nodeId,
>>> >> > IgfsCommunicationMessage msg)
>>> >> > 2. GridCacheIoManager#sendNoRetry(ClusterNode node,
>>> GridCacheMessage
>>> >> msg,
>>> >> > byte plc)
>>> >> > 3. GridCacheIoManager#sendOrderedMessage(ClusterNode node, Object
>>> >> topic,
>>> >> > GridCacheMessage msg, byte plc, long timeout)
>>> >> >
>>> >> > In all other methods "ClusterTopologyCheckedException" escapes
>>> away too
>>> >> > far.
>>> >> >
>>> >> > What do you think about this approach to make compromise between
>>> >> > optimization and correctness?
>>> >> >
>>> >>
>>> >
>>> >
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message