cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Problem after upgrade to 1.0.1
Date Fri, 04 Nov 2011 16:28:55 GMT
One possibility: If you're overloading the cluster, replicas will drop
updates to avoid OOMing.  (This is logged at WARN level.)  Before 1.x
Cassandra would just let that slide, but with w/ 1.0 it started
recording hints for those.

On Thu, Nov 3, 2011 at 7:17 PM, Bryce Godfrey <Bryce.Godfrey@azaleos.com> wrote:
> Thanks for the help so far.
>
> Is there any way to find out why my HintsColumnFamily is so large now, since it wasn't
this way before the upgrade and it seems to just climbing?
>
> I've tried invoking o.a.c.db.HintedHnadoffManager.countPendingHints() thinking I have
a bunch of stale hints from upgrade issues, but it just eventually times out.  Plus the node
it gets invoked against gets thrashed and stops responding, forcing me to restart cassandra.
>
> -----Original Message-----
> From: Jonathan Ellis [mailto:jbellis@gmail.com]
> Sent: Thursday, November 03, 2011 5:06 PM
> To: user@cassandra.apache.org
> Subject: Re: Problem after upgrade to 1.0.1
>
> I found the problem and posted a patch on https://issues.apache.org/jira/browse/CASSANDRA-3451.
 If you build with that patch and rerun scrub the exception should go away.
>
> On Thu, Nov 3, 2011 at 2:08 PM, Bryce Godfrey <Bryce.Godfrey@azaleos.com> wrote:
>> A restart fixed the load numbers, they are back to where I expect them to be now,
but disk utilization is double the load #.  I'm also still get the cfstats exception from
any node.
>>
>> -----Original Message-----
>> From: Jonathan Ellis [mailto:jbellis@gmail.com]
>> Sent: Thursday, November 03, 2011 11:52 AM
>> To: user@cassandra.apache.org
>> Subject: Re: Problem after upgrade to 1.0.1
>>
>> Does restarting the node fix this?
>>
>> On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey <Bryce.Godfrey@azaleos.com> wrote:
>>> Disk utilization is actually about 80% higher than what is reported
>>> for nodetool ring across all my nodes on the data drive
>>>
>>>
>>>
>>> Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T:
>>> 206.926.1978 | M: 206.849.2477
>>>
>>>
>>>
>>> From: Dan Hendry [mailto:dan.hendry.junk@gmail.com]
>>> Sent: Thursday, November 03, 2011 11:47 AM
>>> To: user@cassandra.apache.org
>>> Subject: RE: Problem after upgrade to 1.0.1
>>>
>>>
>>>
>>> Regarding load growth, presumably you are referring to the load as
>>> reported by JMX/nodetool. Have you actually looked at the disk
>>> utilization on the nodes themselves? Potential issue I have seen:
>>> http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html
>>>
>>>
>>>
>>> Dan
>>>
>>>
>>>
>>> From: Bryce Godfrey [mailto:Bryce.Godfrey@azaleos.com]
>>> Sent: November-03-11 14:40
>>> To: user@cassandra.apache.org
>>> Subject: Problem after upgrade to 1.0.1
>>>
>>>
>>>
>>> I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go
>>> just fine with the rolling upgrade.  But now I'm having extreme load
>>> growth on one of my nodes (and others are growing faster than usual
>>> also).  I attempted to run a cfstats against the extremely large node
>>> that was seeing 2x the load of others and I get this error below.
>>> I'm also went into the o.a.c.db.HintedHandoffManager mbean and
>>> attempted to list pending hints to see if it was growing out of
>>> control for some reason, but that just times out eventually for any node.  I'm
not sure what to do next with this issue.
>>>
>>>
>>>
>>>                Column Family: HintsColumnFamily
>>>
>>>                 SSTable count: 3
>>>
>>>                 Space used (live): 12681676437
>>>
>>>                 Space used (total): 10233130272
>>>
>>>                 Number of Keys (estimate): 384
>>>
>>>                 Memtable Columns Count: 117704
>>>
>>>                 Memtable Data Size: 115107307
>>>
>>>                 Memtable Switch Count: 66
>>>
>>>                 Read Count: 0
>>>
>>>                 Read Latency: NaN ms.
>>>
>>>                 Write Count: 21203290
>>>
>>>                 Write Latency: 0.014 ms.
>>>
>>>                 Pending Tasks: 0
>>>
>>>                 Key cache capacity: 3
>>>
>>>                 Key cache size: 0
>>>
>>>                 Key cache hit rate: NaN
>>>
>>>                 Row cache: disabled
>>>
>>>                 Compacted row minimum size: 30130993
>>>
>>>                 Compacted row maximum size: 9223372036854775807
>>>
>>> Exception in thread "main" java.lang.IllegalStateException: Unable to
>>> compute ceiling for max when histogram overflowed
>>>
>>>         at
>>> org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.
>>> java:170)
>>>
>>>         at
>>> org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:3
>>> 9
>>> 5)
>>>
>>>         at
>>> org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamily
>>> S
>>> tore.java:293)
>>>
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>> Method)
>>>
>>>         at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
>>> j
>>> ava:39)
>>>
>>>         at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>>> s
>>> orImpl.java:25)
>>>
>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>
>>>         at
>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMB
>>> e
>>> anIntrospector.java:93)
>>>
>>>         at
>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMB
>>> e
>>> anIntrospector.java:27)
>>>
>>>         at
>>> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.j
>>> a
>>> va:208)
>>>
>>>         at
>>> com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:6
>>> 5
>>> )
>>>
>>>         at
>>> com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:2
>>> 1
>>> 6)
>>>
>>>         at
>>> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(De
>>> f
>>> aultMBeanServerInterceptor.java:666)
>>>
>>>         at
>>> com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.ja
>>> v
>>> a:638)
>>>
>>>         at
>>> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnecti
>>> o
>>> nImpl.java:1404)
>>>
>>>         at
>>> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectio
>>> n
>>> Impl.java:72)
>>>
>>>         at
>>> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run
>>> (
>>> RMIConnectionImpl.java:1265)
>>>
>>>         at
>>> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(R
>>> M
>>> IConnectionImpl.java:1360)
>>>
>>>         at
>>> javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnect
>>> i
>>> onImpl.java:600)
>>>
>>>         at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown
>>> Source)
>>>
>>>         at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>>> s
>>> orImpl.java:25)
>>>
>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>
>>>         at
>>> sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>>>
>>>         at sun.rmi.transport.Transport$1.run(Transport.java:159)
>>>
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>
>>>         at
>>> sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>>>
>>>         at
>>> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:5
>>> 3
>>> 5)
>>>
>>>         at
>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTranspor
>>> t
>>> .java:790)
>>>
>>>         at
>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.
>>> java:649)
>>>
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
>>> u
>>> tor.java:886)
>>>
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:908)
>>>
>>>         at java.lang.Thread.run(Thread.java:662)
>>>
>>>
>>>
>>> Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T:
>>> 206.926.1978 | M: 206.849.2477
>>>
>>>
>>>
>>> No virus found in this incoming message.
>>> Checked by AVG - www.avg.com
>>> Version: 9.0.920 / Virus Database: 271.1.1/3993 - Release Date:
>>> 11/03/11
>>> 03:39:00
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Mime
View raw message