cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dominic Williams (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-3908) Bootstrapping node stalls. Bootstrapper thinks it is still streaming some sstables. The source nodes do not. Caused by IllegalStateException on source nodes.
Date Tue, 14 Feb 2012 10:53:59 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dominic Williams updated CASSANDRA-3908:
----------------------------------------

    Description: 
This problem looks like 2792

I am bootstrapping a new node into my cluster.

There are two keyspaces FightMyMonster and FMM_Studio. The first keyspace successfully streams
and the whole operation is probably at 99%+ when it stalls on some sstables in the much smaller
FMM_Studio keyspace.

Netstats on the bootstrapping node reports it is still streaming:

root:/var/lib/cassandra/data# nodetool -h localhost netstats
Mode: JOINING
Not sending any streams.
Streaming from: /192.168.1.9
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-134-Data.db sections=1
progress=0/160 - 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-132-Data.db sections=1
progress=0/4422 - 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-149-Data.db sections=1 progress=0/6158642
- 0%
Streaming from: /192.168.1.4
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-201-Data.db sections=1 progress=0/50172
- 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-199-Data.db sections=1 progress=0/5140877
- 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-202-Data.db sections=1 progress=0/147346
- 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/Studio-hc-86-Data.db sections=1 progress=0/2014
- 0%
Pool Name                    Active   Pending      Completed
Commands                        n/a         0            478
Responses                       n/a         0         496302

However, running netstats on the source nodes reports they are not streaming:

root:~# nodetool -h localhost netstats
Mode: NORMAL
 Nothing streaming to /192.168.1.11
Not receiving any streams.
Pool Name                    Active   Pending      Completed
Commands                        n/a         0       13291116
Responses                       n/a         0        8334754

Examination of the logs on the source nodes does NOT show an error for the specific sstables
that are stalled. The starting of streaming is duly logged:

pStage:1] 2012-02-14 01:40:58,746 Gossiper.java (line 804) InetAddress /192.168.1.11 is now
UP
 INFO [StreamStage:1] 2012-02-14 01:41:26,765 StreamOut.java (line 114) Beginning transfer
to /192.168.1.11
 INFO [StreamStage:1] 2012-02-14 01:41:26,765 StreamOut.java (line 95) Flushing memtables
for [CFS(Keyspace='FMM_Studio', ColumnFamily='Classes'), CFS(Keyspace='FMM_Studio', ColumnFamily='Part
sData'), CFS(Keyspace='FMM_Studio', ColumnFamily='Studio'), CFS(Keyspace='FMM_Studio', ColumnFamily='AuthorClasses')]...
 INFO [StreamStage:1] 2012-02-14 01:41:26,825 StreamOut.java (line 160) Stream context metadata
[/var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db sections=1 progress=0/2460670 -
0%, /
var/lib/cassandra/data/FMM_Studio/PartsData-hc-149-Data.db sections=1 progress=0/6158642 -
0%, /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-134-Data.db sections=1 progress=0/160
- 0%, /
var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-132-Data.db sections=1 progress=0/4422
- 0%], 6 sstables.
 INFO [StreamStage:1] 2012-02-14 01:41:26,825 StreamOutSession.java (line 203) Streaming to
/192.168.1.11
 INFO [StreamStage:1] 2012-02-14 01:41:26,835 StreamOut.java (line 114) Beginning transfer
to /192.168.1.11

There does however appear to have been an IllegalStateException for another sstable in this
keyspace (which occurs a second or so after streaming has begun). Perhaps this broke the streaming...

ERROR [MiscStage:1] 2012-02-14 01:41:27,235 AbstractCassandraDaemon.java (line 139) Fatal
exception in thread Thread[MiscStage:1,5,main]
java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db
but is null
        at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
        at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
ERROR [MiscStage:1] 2012-02-14 01:41:27,285 AbstractCassandraDaemon.java (line 139) Fatal
exception in thread Thread[MiscStage:1,5,main]
java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db
but is null
        at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
        at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)


  was:
This problem looks like 2792

I am bootstrapping a new node into my cluster.

There are two keyspaces FightMyMonster and FMM_Studio. The first keyspace successfully streams
and the whole operation is probably at 99%+ when it stalls on some sstables in the much smaller
FMM_Studio keyspace.

Netstats on the bootstrapping node reports it is still streaming:

root:/var/lib/cassandra/data# nodetool -h localhost netstats
Mode: JOINING
Not sending any streams.
Streaming from: /192.168.1.9
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-134-Data.db sections=1
progress=0/160 - 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-132-Data.db sections=1
progress=0/4422 - 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-149-Data.db sections=1 progress=0/6158642
- 0%
Streaming from: /192.168.1.4
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-201-Data.db sections=1 progress=0/50172
- 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-199-Data.db sections=1 progress=0/5140877
- 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-202-Data.db sections=1 progress=0/147346
- 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/Studio-hc-86-Data.db sections=1 progress=0/2014
- 0%
Pool Name                    Active   Pending      Completed
Commands                        n/a         0            478
Responses                       n/a         0         496302

However, running netstats on the source nodes reports they are not streaming:

root:~# nodetool -h localhost netstats
Mode: NORMAL
 Nothing streaming to /192.168.1.11
Not receiving any streams.
Pool Name                    Active   Pending      Completed
Commands                        n/a         0       13291116
Responses                       n/a         0        8334754

Examination of the logs on the source nodes show an IllegalStateException that has likely
interrupted/broken the streaming process.

ERROR [MiscStage:1] 2012-02-14 01:41:27,235 AbstractCassandraDaemon.java (line 139) Fatal
exception in thread Thread[MiscStage:1,5,main]
java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db
but is null
        at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
        at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
ERROR [MiscStage:1] 2012-02-14 01:41:27,285 AbstractCassandraDaemon.java (line 139) Fatal
exception in thread Thread[MiscStage:1,5,main]
java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db
but is null
        at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
        at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)


There are two problems:

(1) the source nodes should report to the bootstrapping node that there has been a problem
so that it may retry or giveup

(2) the IllegalStateException that breaks streaming



    
> Bootstrapping node stalls. Bootstrapper thinks it is still streaming some sstables. The
source nodes do not. Caused by IllegalStateException on source nodes.
> -------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3908
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3908
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.7
>         Environment: Ubuntu
>            Reporter: Dominic Williams
>              Labels: bootstrap, repair, streaming, streams
>             Fix For: 1.0.8
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> This problem looks like 2792
> I am bootstrapping a new node into my cluster.
> There are two keyspaces FightMyMonster and FMM_Studio. The first keyspace successfully
streams and the whole operation is probably at 99%+ when it stalls on some sstables in the
much smaller FMM_Studio keyspace.
> Netstats on the bootstrapping node reports it is still streaming:
> root:/var/lib/cassandra/data# nodetool -h localhost netstats
> Mode: JOINING
> Not sending any streams.
> Streaming from: /192.168.1.9
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-134-Data.db sections=1
progress=0/160 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-132-Data.db sections=1
progress=0/4422 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-149-Data.db sections=1
progress=0/6158642 - 0%
> Streaming from: /192.168.1.4
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-201-Data.db sections=1
progress=0/50172 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-199-Data.db sections=1
progress=0/5140877 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-202-Data.db sections=1
progress=0/147346 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/Studio-hc-86-Data.db sections=1 progress=0/2014
- 0%
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0            478
> Responses                       n/a         0         496302
> However, running netstats on the source nodes reports they are not streaming:
> root:~# nodetool -h localhost netstats
> Mode: NORMAL
>  Nothing streaming to /192.168.1.11
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0       13291116
> Responses                       n/a         0        8334754
> Examination of the logs on the source nodes does NOT show an error for the specific sstables
that are stalled. The starting of streaming is duly logged:
> pStage:1] 2012-02-14 01:40:58,746 Gossiper.java (line 804) InetAddress /192.168.1.11
is now UP
>  INFO [StreamStage:1] 2012-02-14 01:41:26,765 StreamOut.java (line 114) Beginning transfer
to /192.168.1.11
>  INFO [StreamStage:1] 2012-02-14 01:41:26,765 StreamOut.java (line 95) Flushing memtables
for [CFS(Keyspace='FMM_Studio', ColumnFamily='Classes'), CFS(Keyspace='FMM_Studio', ColumnFamily='Part
> sData'), CFS(Keyspace='FMM_Studio', ColumnFamily='Studio'), CFS(Keyspace='FMM_Studio',
ColumnFamily='AuthorClasses')]...
>  INFO [StreamStage:1] 2012-02-14 01:41:26,825 StreamOut.java (line 160) Stream context
metadata [/var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db sections=1 progress=0/2460670
- 0%, /
> var/lib/cassandra/data/FMM_Studio/PartsData-hc-149-Data.db sections=1 progress=0/6158642
- 0%, /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-134-Data.db sections=1 progress=0/160
- 0%, /
> var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-132-Data.db sections=1 progress=0/4422
- 0%], 6 sstables.
>  INFO [StreamStage:1] 2012-02-14 01:41:26,825 StreamOutSession.java (line 203) Streaming
to /192.168.1.11
>  INFO [StreamStage:1] 2012-02-14 01:41:26,835 StreamOut.java (line 114) Beginning transfer
to /192.168.1.11
> There does however appear to have been an IllegalStateException for another sstable in
this keyspace (which occurs a second or so after streaming has begun). Perhaps this broke
the streaming...
> ERROR [MiscStage:1] 2012-02-14 01:41:27,235 AbstractCassandraDaemon.java (line 139) Fatal
exception in thread Thread[MiscStage:1,5,main]
> java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db
but is null
>         at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
>         at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ERROR [MiscStage:1] 2012-02-14 01:41:27,285 AbstractCassandraDaemon.java (line 139) Fatal
exception in thread Thread[MiscStage:1,5,main]
> java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db
but is null
>         at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
>         at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message