cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fernando Gonçalves (JIRA) <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-10233) IndexOutOfBoundsException in HintedHandOffManager
Date Thu, 01 Oct 2015 15:12:27 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939944#comment-14939944
] 

Fernando Gonçalves edited comment on CASSANDRA-10233 at 10/1/15 3:11 PM:
-------------------------------------------------------------------------

Hi [~nutbunnies], I work together with Eiti Kimura at Movile, and this issue is happening
in one of our cluster of cassandra.

I'll try answer your questions:

- how many nodes?
Currently we are running with 15 nodes, in 2 racks, in the same datacenter. One rack has 7
nodes and the other has 8 nodes.

- assuming rolling upgrade
I did't understand if this is a question, but what I can say is that we already upgraded to
version 2.1.9 yesterday, and  the problem started when we added 7 new nodes to the cluster
a week ago. We add one node a time, waiting for each node join the cluster before start the
joining of the next node.

- jdk change?
We are using the same version for a long time, Java Hotspot 1.8.0_45-b14.

- roughly how long was each node unavailable
I'm sending the uptime of each node. The nodes were not unavailable, only very slow to respond
requests some times.
pompeia1   14:52:37 up 126 days
pompeia2   14:52:37 up 126 days
pompeia3   14:52:37 up 126 days
pompeia4   14:52:37 up 126 days
pompeia5   14:52:37 up 126 days
pompeia6   14:52:37 up 126 days
pompeia7   14:52:37 up 82 days
pompeia8   14:52:37 up 82 days
pompeia9   14:52:37 up 7 days
pompeia10  14:52:37 up 7 days
pompeia11  14:52:37 up 7 days
pompeia12  14:52:37 up 7 days
pompeia13  14:52:37 up 7 days
pompeia14  14:52:37 up 7 days
pompeia15  14:52:37 up 7 days

- gc_grace value of table with broken hint
values of max_hint_window_in_ms, max_hints_delivery_threads, hinted_handoff_enabled, hinted_handoff_throttle_in_kb
in cassandra.yaml
We are not sure about the table that is problematic, but we think that is the most large (considering
the records count and number of columns) and most used table that we have, and I'm going to
inform the its values:
-- gc_grace_seconds = 864000
The value in the application.yml
--  max_hint_window_in_ms: 10800000
-- max_hints_delivery_threads: 2
-- hinted_handoff_enabled: true
-- hinted_handoff_throttle_in_kb: 1024

- what type of mutation was the hint without a target_id?
I don't know how to get the type of mutation, only the mutation value, that is a blob in the
table. Can you help me here?

If you need any other information, I can send to you!
Thank you!


was (Author: fhsgoncalves):
Hi [~nutbunnies], I work together with Eiti Kimura at Movile, and this issue is happening
in one of our cluster of cassandra.

I'll try answer your questions:

- how many nodes?
Currently we are running with 15 nodes, in 2 racks, in the same datacenter. One rack has 7
nodes and the other has 8 nodes.

- assuming rolling upgrade
I did't understand if this is a question, but what I can say is that we already upgraded to
version 2.1.9 yesterday, and  the problem started when we added 7 new nodes to the cluster
a week ago. We add one node a time, waiting for each node join the cluster before start the
joining of the next node.

- jdk change?
We are using the same version for a long time, Java Hotspot 1.8.0_45-b14.

- roughly how long was each node unavailable
pompeia1   14:52:37 up 126 days
pompeia2   14:52:37 up 126 days
pompeia3   14:52:37 up 126 days
pompeia4   14:52:37 up 126 days
pompeia5   14:52:37 up 126 days
pompeia6   14:52:37 up 126 days
pompeia7   14:52:37 up 82 days
pompeia8   14:52:37 up 82 days
pompeia9   14:52:37 up 7 days
pompeia10  14:52:37 up 7 days
pompeia11  14:52:37 up 7 days
pompeia12  14:52:37 up 7 days
pompeia13  14:52:37 up 7 days
pompeia14  14:52:37 up 7 days
pompeia15  14:52:37 up 7 days

- gc_grace value of table with broken hint
values of max_hint_window_in_ms, max_hints_delivery_threads, hinted_handoff_enabled, hinted_handoff_throttle_in_kb
in cassandra.yaml
We are not sure about the table that is problematic, but we think that is the most large (considering
the records count and number of columns) and most used table that we have, and I'm going to
inform the its values:
-- gc_grace_seconds = 864000
The value in the application.yml
--  max_hint_window_in_ms: 10800000
-- max_hints_delivery_threads: 2
-- hinted_handoff_enabled: true
-- hinted_handoff_throttle_in_kb: 1024

- what type of mutation was the hint without a target_id?
I don't know how to get the type of mutation, only the mutation value, that is a blob in the
table. Can you help me here?

If you need any other information, I can send to you!
Thank you!

> IndexOutOfBoundsException in HintedHandOffManager
> -------------------------------------------------
>
>                 Key: CASSANDRA-10233
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10233
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Cassandra 2.2.0
>            Reporter: Omri Iluz
>            Assignee: Andrew Hust
>         Attachments: cassandra-2.1.8-10233-v2.txt, cassandra-2.1.8-10233.txt
>
>
> After upgrading our cluster to 2.2.0, the following error started showing exectly every
10 minutes on every server in the cluster:
> {noformat}
> INFO  [CompactionExecutor:1381] 2015-08-31 18:31:55,506 CompactionTask.java:142 - Compacting
(8e7e1520-500e-11e5-b1e3-e95897ba4d20) [/cassandra/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/la-540-big-Data.db:level=0,
]
> INFO  [CompactionExecutor:1381] 2015-08-31 18:31:55,599 CompactionTask.java:224 - Compacted
(8e7e1520-500e-11e5-b1e3-e95897ba4d20) 1 sstables to [/cassandra/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/la-541-big,]
to level=0.  1,544,495 bytes to 1,544,495 (~100% of original) in 93ms = 15.838121MB/s.  0
total partitions merged to 4.  Partition merge counts were {1:4, }
> ERROR [HintedHandoff:1] 2015-08-31 18:31:55,600 CassandraDaemon.java:182 - Exception
in thread Thread[HintedHandoff:1,1,main]
> java.lang.IndexOutOfBoundsException: null
> 	at java.nio.Buffer.checkIndex(Buffer.java:538) ~[na:1.7.0_79]
> 	at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:410) ~[na:1.7.0_79]
> 	at org.apache.cassandra.utils.UUIDGen.getUUID(UUIDGen.java:106) ~[apache-cassandra-2.2.0.jar:2.2.0]
> 	at org.apache.cassandra.db.HintedHandOffManager.scheduleAllDeliveries(HintedHandOffManager.java:515)
~[apache-cassandra-2.2.0.jar:2.2.0]
> 	at org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:88)
~[apache-cassandra-2.2.0.jar:2.2.0]
> 	at org.apache.cassandra.db.HintedHandOffManager$1.run(HintedHandOffManager.java:168)
~[apache-cassandra-2.2.0.jar:2.2.0]
> 	at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
~[apache-cassandra-2.2.0.jar:2.2.0]
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_79]
> 	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) [na:1.7.0_79]
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
[na:1.7.0_79]
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[na:1.7.0_79]
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_79]
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_79]
> 	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message