cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Singh <rahul.xavier.si...@gmail.com>
Subject RE: RE: 答复: A node down every day in a 6 nodes cluster
Date Tue, 27 Mar 2018 15:56:09 GMT
It may be that the wife partition is bombarded more than other partitions. What’s your RF
on that keyspace? If if it’s greater than 1 I’d expect other nodes to get the same type
of load.

--
Rahul Singh
rahul.singh@anant.us

Anant Corporation

On Mar 27, 2018, 5:56 AM -0700, Kenneth Brotman <kenbrotman@yahoo.com.invalid>, wrote:
> First, anything Jeff Jirsa says is likely very accurate, like it being a really good
idea to also get off the version you’re on and onto a version that fixes some of the known
problems of the version you’re one.
>
> Replacing a running node:
> https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsReplaceLiveNode.html
>
> Kenneth Brotman
>
>
> From: Xiangfei Ni [mailto:xiangfei.ni@cm-dt.com]
> Sent: Tuesday, March 27, 2018 5:44 AM
> To: user@cassandra.apache.org
> Subject: Re:RE: 答复: A node down every day in a 6 nodes cluster
>
> Thanks,Kenneth,this is production database,and it is one of three seed nodes,do you have
doc for replacing a seed node?
>
>
>
> 发自我的小米手机
> 在 Kenneth Brotman <kenbrotman@yahoo.com.INVALID>,2018年3月27日 下午7:45写道:
> David,
>
> Can you replace the misbehaving node to see if that resolves the problem?
>
> Kenneth Brotman
>
> From: Xiangfei Ni [mailto:xiangfei.ni@cm-dt.com]
> Sent: Tuesday, March 27, 2018 3:27 AM
> To: Jeff Jirsa
> Cc: user@cassandra.apache.org
> Subject: 答复: 答复: A node down every day in a 6 nodes cluster
>
> Thanks Jeff,
>            So your suggestion is to first resolve the data model issue which
cause wide partition,right?
>
> Best Regards,
>
> 倪项菲/ David Ni
> 中移德电网络科技有限公司
> Virtue Intelligent Network Ltd, co.
> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
> Mob: +86 13797007811|Tel: + 86 27 5024 2516
>
> 发件人: Jeff Jirsa <jjirsa@gmail.com>
> 发送时间: 2018年3月27日 11:50
> 收件人: Xiangfei Ni <xiangfei.ni@cm-dt.com>
> 抄送: user@cassandra.apache.org
> 主题: Re: 答复: A node down every day in a 6 nodes cluster
>
> Only one node having the problem is suspicious. May be that your application is improperly
pooling connections, or you have a hardware problem.
>
> I dont see anything in nodetool that explains it, though you certainly have a data model
likely to cause problems over time (the cardinality of
> rt_ac_stat.idx_rt_ac_stat_prot_verrt_ac_stat.idx_rt_ac_stat_prot_ver is such that you
have very wide partitions and it'll be difficult to read).
>
> On Mon, Mar 26, 2018 at 8:26 PM, Xiangfei Ni <xiangfei.ni@cm-dt.com> wrote:
> > quote_type
> > Hi Jeff,
> >     I need to restart the node manually every time,only one node has this problem.
> >     I have attached the nodetool output,thanks.
> >
> > Best Regards,
> >
> > 倪项菲/ David Ni
> > 中移德电网络科技有限公司
> > Virtue Intelligent Network Ltd, co.
> > Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
> > Mob: +86 13797007811|Tel: + 86 27 5024 2516
> >
> > 发件人: Jeff Jirsa <jjirsa@gmail.com>
> > 发送时间: 2018年3月27日 11:03
> > 收件人: user@cassandra.apache.org
> > 主题: Re: A node down every day in a 6 nodes cluster
> >
> > That warning isn’t sufficient to understand why the node is going down
> >
> >
> > Cassandra 3.9 has some pretty serious known issues - upgrading to 3.11.3 is likely
a good idea
> >
> > Are the nodes coming up on their own? Or are you restarting them?
> >
> > Paste the output of nodetool tpstats and nodetool cfstats
> >
> >
> >
> > --
> > Jeff Jirsa
> >
> >
> > On Mar 26, 2018, at 7:56 PM, Xiangfei Ni <xiangfei.ni@cm-dt.com> wrote:
> > > Hi Cassandra experts,
> > >   I am facing an issue,a node downs every day in a 6 nodes cluster,the cluster
is just in one DC,
> > >   Every node has 4C 16G,and the heap configuration is MAX_HEAP_SIZE=8192m
HEAP_NEWSIZE=512m,every node load about 200G data,the RF for the business CF is 3,a node downs
one time every day,the system.log shows below info:
> > > WARN  [Native-Transport-Requests-19] 2018-03-26 18:53:17,128 CassandraAuthorizer.java:101
- CassandraAuthorizer failed to authorize #<User nev_tsp_sa> for <table nev_prod_tsp.latest_rt_alarm>
> > > ERROR [Native-Transport-Requests-19] 2018-03-26 18:53:17,129 QueryMessage.java:128
- Unexpected error during query
> > > com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException:
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only
0 responses.
> > >         at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203)
~[guava-18.0.jar:na]
> > >         at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
~[guava-18.0.jar:na]
> > >         at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
~[guava-18.0.jar:na]
> > >         at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
~[guava-18.0.jar:na]
> > >         at org.apache.cassandra.auth.AuthCache.get(AuthCache.java:108)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:45)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.auth.AuthenticatedUser.getPermissions(AuthenticatedUser.java:104)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.service.ClientState.authorize(ClientState.java:419)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.service.ClientState.checkPermissionOnResourceChain(ClientState.java:352)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:329)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:316)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:300)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.cql3.statements.ModificationStatement.checkAccess(ModificationStatement.java:211)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:185)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:219)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:204)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:115)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:513)
[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:407)
[apache-cassandra-3.9.jar:3.9]
> > >         at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
[netty-all-4.0.39.Final.jar:4.0.39.Final]
> > >         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366)
[netty-all-4.0.39.Final.jar:4.0.39.Final]
> > >         at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:35)
[netty-all-4.0.39.Final.jar:4.0.39.Final]
> > >         at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:357)
[netty-all-4.0.39.Final.jar:4.0.39.Final]
> > >         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[na:1.8.0_91]
> > >         at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)
[apache-cassandra-3.9.jar:3.9]
> > >         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> > > Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException:
Operation timed out - received only 0 responses.
> > >         at org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:102)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.auth.PermissionsCache.lambda$new$0(PermissionsCache.java:37)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.auth.AuthCache$1.load(AuthCache.java:183)
~[apache-cassandra-3.9.jar:3.9]
> > >         at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
~[guava-18.0.jar:na]
> > >         at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
~[guava-18.0.jar:na]
> > >         at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
~[guava-18.0.jar:na]
> > >         at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
~[guava-18.0.jar:na]
> > >         ... 26 common frames omitted
> > > Caused by: org.apache.cassandra.exceptions.ReadTimeoutException: Operation
timed out - received only 0 responses.
> > >         at org.apache.cassandra.service.ReadCallback.awaitResults(ReadCallback.java:132)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:137)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:145)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch(StorageProxy.java:1718)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1667)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1608)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1527)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:975)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:271)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:232)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.auth.CassandraAuthorizer.addPermissionsForRole(CassandraAuthorizer.java:227)
~[apache-cassandra-3.9.jar:3.9]
> > >         at org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:93)
~[apache-cassandra-3.9.jar:3.9]
> > >         ... 32 common frames omitted
> > > WARN  [Native-Transport-Requests-23] 2018-03-26 18:53:17,131 CassandraAuthorizer.java:101
- CassandraAuthorizer failed to authorize #<User nev_tsp_sa> for <table nev_prod_tsp.rt_alarm_unite>
> > > ERROR [Native-Transport-Requests-64] 2018-03-26 18:53:17,135 QueryMessage.java:128
- Unexpected error during query
> > > com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException:
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only
0 responses.
> > >         at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203)
~[guava-18.0.jar:na]
> > >
> > > I have confirmed that nev_tsp_sa has all rights on nev_prod_tsp keyspace:
> > > cassandra@cqlsh:system_auth> select * from role_permissions where role =
'nev_tsp_sa';
> > >
> > > role       | resource          | permissions
> > > ------------+-------------------+--------------------------------------------------------------
> > > nev_tsp_sa | data/nev_prod_tsp | {'ALTER', 'AUTHORIZE', 'CREATE', 'DROP', 'MODIFY',
'SELECT'}
> > >
> > > the cache disk can be read/write as normal.
> > >
> > > Highly appreciated if anyone can help,thanks very much !
> > >
> > >
> > > Best Regards,
> > >
> > > 倪项菲/ David Ni
> > > 中移德电网络科技有限公司
> > > Virtue Intelligent Network Ltd, co.
> > > Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
> > > Mob: +86 13797007811|Tel: + 86 27 5024 2516
> > >
>

Mime
View raw message