cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write
Date Sat, 07 Nov 2015 00:37:10 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994779#comment-14994779
] 

Paulo Motta commented on CASSANDRA-10485:
-----------------------------------------

I implemented an alternative approach which is a bit cleaner and more deterministic. The basic
idea is to have a new method {{TokenMetadata.isMemberOrPending()}}, and only submit hints
to endpoints that are ring members or pending membership, thus, avoiding fetching null host
IDs for removed pending endpoints while the new pending ranges are being calculated.

In order to support the {{TokenMetadata.isMemberOrPending()}} method, the {{TokenMetadata}}
maintains a new {{livePendingEndpoints}} set which is populated every time new pending ranges
are set. When endpoints are removed from {{TokenMetadata}} via the {{removeEndpoint}} method,
they're also removed from the {{livePendingEndpoints}} set, so {{TokenMetadata.isMemberOrPending()}}
returns false if the endpoint is evicted from the ring. Since both {{removeEndpoint}} and
{{setPendingRanges}} update this set, they share a write lock. {{TokenMetadata.isMemberOrPending()}}
also uses a read lock, similar to other methods {{isMember()}} or {{getHostId()}}.

Merging the solution from 2.1 to 2.2/3.0 was a bit tricky because the pending ranges calculation
was extracted from the {{PendingRangeCalculatorService}} to {{TokenMetadata}} within a read
lock, so I had to separate the actual calculation (within a read lock) to the actual  assignment
of the {{pendingRanges}} via the {{setPendingRanges}} method, which uses a write lock. On
3.0, the hints submission part is slightly different (even simpler) due to the new hints implementation.

It's still not ideal but I guess better than the previous approach. I will add a link from
this ticket to CASSANDRA-6061 so we can take this ticket into account when refactoring the
{{TokenMetadata}}.

Below are the new branches and test results:
||2.1||2.2||3.0||trunk||
|[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-10485-v3]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-10485-v3]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-10485-v3]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-10485-v3]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-v3-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-v3-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-v3-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-v3-testall/lastCompletedBuild/testReport/]|
|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-v3-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-v3-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-v3-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-v3-dtest/lastCompletedBuild/testReport/]|


> Missing host ID on hinted handoff write
> ---------------------------------------
>
>                 Key: CASSANDRA-10485
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10485
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>             Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> when I restart one of them I receive the error "Missing host ID":
> {noformat}
> WARN  [SharedPool-Worker-1] 2015-10-08 13:15:33,882 AbstractTracingAwareExecutorService.java:169
- Uncaught exception on thread Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.AssertionError: Missing host ID for 63.251.156.141
>         at org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978)
~[apache-cassandra-2.1.3.jar:2.1.3]
>         at org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950)
~[apache-cassandra-2.1.3.jar:2.1.3]
>         at org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235)
~[apache-cassandra-2.1.3.jar:2.1.3]
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_60]
>         at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
~[apache-cassandra-2.1.3.jar:2.1.3]
>         at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.3.jar:2.1.3]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {noformat}
> If I made nodetool status, the problematic node has ID:
> {noformat}
> UN  10.10.10.12  1.3 TB     1       ?       4d5c8fd2-a909-4f09-a23c-4cd6040f338a  rack3
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message