cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nikolai Grigoriev (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6716) nodetool scrub constantly fails with RuntimeException (Tried to hard link to file that does not exist)
Date Wed, 19 Feb 2014 14:32:19 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13905486#comment-13905486
] 

Nikolai Grigoriev commented on CASSANDRA-6716:
----------------------------------------------

OK, I am observing *massive* problems with the sstables as of moving from 2.0.4 to 2.0.5.
I am rolling back now and scrubbing (I wish I had Mr. Net ;) ). Just when scrubbing OpsCenter
keyspaces I see tons of messages like this:

{quote}
WARN [CompactionExecutor:110] 2014-02-19 14:25:13,811 OutputHandler.java (line 52) 1 out of
order rows found while scrubbing SSTableReader(path='/hadoop/disk2/cassandra/data/OpsCenter/rollups60/OpsCenter-rollups60-jb-1901-Data.db');
Those have been written (in order) to a new sstable (SSTableReader(path='/hadoop/disk5/cassandra/data/OpsCenter/rollups60/OpsCenter-rollups60-jb-15423-Data.db'))
{quote}

I am not exaggerating - dozens of thousands. To be fair, I am not 100% if the problem was
there with 2.0.4. But as of 2.0.5 I have noticed the frequent exceptions about the key ordering,
that caught my attention.

> nodetool scrub constantly fails with RuntimeException (Tried to hard link to file that
does not exist)
> ------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6716
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6716
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Cassandra 2.0.5 (built from source), Linux, 6 nodes, JDK 1.7
>            Reporter: Nikolai Grigoriev
>         Attachments: system.log.gz
>
>
> It seems that since recently I have started getting a number of exceptions like "File
not found" on all Cassandra nodes. Currently I am getting an exception like this every couple
of seconds on each node, for different keyspaces and CFs.
> I have tried to restart the nodes, tried to scrub them. No luck so far. It seems that
scrub cannot complete on any of these nodes, at some point it fails because of the file that
it can't find.
> One one of the nodes currently the "nodetool scrub" command fails  instantly and consistently
with this exception:
> {code}
> # /opt/cassandra/bin/nodetool scrub 
> Exception in thread "main" java.lang.RuntimeException: Tried to hard link to file that
does not exist /mnt/disk5/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-28049-Data.db
> 	at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:75)
> 	at org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1215)
> 	at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1826)
> 	at org.apache.cassandra.db.ColumnFamilyStore.scrub(ColumnFamilyStore.java:1122)
> 	at org.apache.cassandra.service.StorageService.scrub(StorageService.java:2159)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
> 	at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
> 	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
> 	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
> 	at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
> 	at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
> 	at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
> 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
> 	at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
> 	at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
> 	at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
> 	at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
> 	at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
> 	at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
> 	at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
> 	at sun.rmi.transport.Transport$1.run(Transport.java:177)
> 	at sun.rmi.transport.Transport$1.run(Transport.java:174)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
> 	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
> 	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
> 	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:724)
> {code}
> Also I have noticed that the files that are missing are often (or maybe always?) referred
to in the log as follows:
> {quote}
>  WARN 00:06:10,597 At level 3, SSTableReader(path='/mnt/disk5/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-26776-Data.db')
[DecoratedKey(-9053060597280257896, 0010f582cddaca974d7198ae30f194ccfd0c00001000000000004b818d00000000000000010000100000000000004000000000000000000300),
DecoratedKey(-8855915848970248008, 00103ce153dfeeb547fb881a51adf611f6cf0000100000000000f04f4700000000000000010000100000000000004000000000000000000500)]
overlaps SSTableReader(path='/mnt/disk2/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-28022-Data.db')
[DecoratedKey(-8964446543595889729, 001043214a8bdcfd46a3b8ea71da2d57bb9a0000100000000001117c0d00000000000000000000100000000000004000000000000000000100),
DecoratedKey(-8848132752710859808, 0010d1f6de8039d54218bf5b1e184335df5f000010000000000062e52600000000000000010000100000000000004000000000000000000400)].
 This could be caused by a bug in Cassandra 1.1.0 .. 1.1.3 or due to the fact that you have
dropped sstables from another node into the data directory. Sending back to L0.  If you didn't
drop in sstables, and have not yet run scrub, you should do so since you may also have rows
out-of-order within an sstable
>  WARN [RMI TCP Connection(2)-10.3.45.158] 2014-02-18 00:06:10,597 LeveledManifest.java
(line 171) At level 3, SSTableReader(path='/mnt/disk5/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-26776-Data.db')
[DecoratedKey(-9053060597280257896, 0010f582cddaca974d7198ae30f194ccfd0c00001000000000004b818d00000000000000010000100000000000004000000000000000000300),
DecoratedKey(-8855915848970248008, 00103ce153dfeeb547fb881a51adf611f6cf0000100000000000f04f4700000000000000010000100000000000004000000000000000000500)]
overlaps SSTableReader(path='/mnt/disk2/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-28022-Data.db')
[DecoratedKey(-8964446543595889729, 001043214a8bdcfd46a3b8ea71da2d57bb9a0000100000000001117c0d00000000000000000000100000000000004000000000000000000100),
DecoratedKey(-8848132752710859808, 0010d1f6de8039d54218bf5b1e184335df5f000010000000000062e52600000000000000010000100000000000004000000000000000000400)].
 This could be caused by a bug in Cassandra 1.1.0 .. 1.1.3 or due to the fact that you have
dropped sstables from another node into the data directory. Sending back to L0.  If you didn't
drop in sstables, and have not yet run scrub, you should do so since you may also have rows
out-of-order within an sstable
> {quote}
> I never had anything but Cassandra 2.0 on these systems. Also I have recreated my test
data from scratch with 2.0.4.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message