incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryce Godfrey <Bryce.Godf...@azaleos.com>
Subject RE: Problem after upgrade to 1.0.1
Date Wed, 09 Nov 2011 00:28:42 GMT
I have no errors in my system.log just these typs of warnings occasionally:
WARN [pool-1-thread-1] 2011-11-08 00:03:44,726 Memtable.java (line 167) setting live ratio
to minimum of 1.0 instead of 0.9511448007676252

I did find the problem with my data drive consumption being so large, as I did not know that
running scrub after the upgrade would take a snapshot of the data.  Once I removed all the
snapshots, they data drive is back down to where I expect it to be.  Although the Load numbers
reported by ring are much larger then what is in the data drive.

I've also upgrade to 1.0.2 and re-ran scrub, and now I can run cfstats again, so thanks for
that.  Although I'm still confused on why the hints CF has become so large on a few of the
nodes;

                Column Family: HintsColumnFamily
                SSTable count: 11
                Space used (live): 127490858389
                Space used (total): 72123363085
                Number of Keys (estimate): 1408
                Memtable Columns Count: 43174
                Memtable Data Size: 44376138
                Memtable Switch Count: 103
                Read Count: 494
                Read Latency: NaN ms.
                Write Count: 30970531
                Write Latency: NaN ms.
                Pending Tasks: 0
                Key cache capacity: 14
                Key cache size: 10
                Key cache hit rate: NaN
                Row cache: disabled
                Compacted row minimum size: 88149
                Compacted row maximum size: 53142810146
                Compacted row mean size: 6065512727



-----Original Message-----
From: Jonathan Ellis [mailto:jbellis@gmail.com] 
Sent: Friday, November 04, 2011 9:29 AM
To: user@cassandra.apache.org
Subject: Re: Problem after upgrade to 1.0.1

One possibility: If you're overloading the cluster, replicas will drop updates to avoid OOMing.
 (This is logged at WARN level.)  Before 1.x Cassandra would just let that slide, but with
w/ 1.0 it started recording hints for those.

On Thu, Nov 3, 2011 at 7:17 PM, Bryce Godfrey <Bryce.Godfrey@azaleos.com> wrote:
> Thanks for the help so far.
>
> Is there any way to find out why my HintsColumnFamily is so large now, since it wasn't
this way before the upgrade and it seems to just climbing?
>
> I've tried invoking o.a.c.db.HintedHnadoffManager.countPendingHints() thinking I have
a bunch of stale hints from upgrade issues, but it just eventually times out.  Plus the node
it gets invoked against gets thrashed and stops responding, forcing me to restart cassandra.
>
> -----Original Message-----
> From: Jonathan Ellis [mailto:jbellis@gmail.com]
> Sent: Thursday, November 03, 2011 5:06 PM
> To: user@cassandra.apache.org
> Subject: Re: Problem after upgrade to 1.0.1
>
> I found the problem and posted a patch on https://issues.apache.org/jira/browse/CASSANDRA-3451.
 If you build with that patch and rerun scrub the exception should go away.
>
> On Thu, Nov 3, 2011 at 2:08 PM, Bryce Godfrey <Bryce.Godfrey@azaleos.com> wrote:
>> A restart fixed the load numbers, they are back to where I expect them to be now,
but disk utilization is double the load #.  I'm also still get the cfstats exception from
any node.
>>
>> -----Original Message-----
>> From: Jonathan Ellis [mailto:jbellis@gmail.com]
>> Sent: Thursday, November 03, 2011 11:52 AM
>> To: user@cassandra.apache.org
>> Subject: Re: Problem after upgrade to 1.0.1
>>
>> Does restarting the node fix this?
>>
>> On Thu, Nov 3, 2011 at 1:51 PM, Bryce Godfrey <Bryce.Godfrey@azaleos.com> wrote:
>>> Disk utilization is actually about 80% higher than what is reported 
>>> for nodetool ring across all my nodes on the data drive
>>>
>>>
>>>
>>> Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T:
>>> 206.926.1978 | M: 206.849.2477
>>>
>>>
>>>
>>> From: Dan Hendry [mailto:dan.hendry.junk@gmail.com]
>>> Sent: Thursday, November 03, 2011 11:47 AM
>>> To: user@cassandra.apache.org
>>> Subject: RE: Problem after upgrade to 1.0.1
>>>
>>>
>>>
>>> Regarding load growth, presumably you are referring to the load as 
>>> reported by JMX/nodetool. Have you actually looked at the disk 
>>> utilization on the nodes themselves? Potential issue I have seen:
>>> http://www.mail-archive.com/user@cassandra.apache.org/msg18142.html
>>>
>>>
>>>
>>> Dan
>>>
>>>
>>>
>>> From: Bryce Godfrey [mailto:Bryce.Godfrey@azaleos.com]
>>> Sent: November-03-11 14:40
>>> To: user@cassandra.apache.org
>>> Subject: Problem after upgrade to 1.0.1
>>>
>>>
>>>
>>> I recently upgraded from 0.8.6 to 1.0.1 and everything seemed to go 
>>> just fine with the rolling upgrade.  But now I'm having extreme load 
>>> growth on one of my nodes (and others are growing faster than usual 
>>> also).  I attempted to run a cfstats against the extremely large 
>>> node that was seeing 2x the load of others and I get this error below.
>>> I'm also went into the o.a.c.db.HintedHandoffManager mbean and 
>>> attempted to list pending hints to see if it was growing out of 
>>> control for some reason, but that just times out eventually for any node.  I'm
not sure what to do next with this issue.
>>>
>>>
>>>
>>>                Column Family: HintsColumnFamily
>>>
>>>                 SSTable count: 3
>>>
>>>                 Space used (live): 12681676437
>>>
>>>                 Space used (total): 10233130272
>>>
>>>                 Number of Keys (estimate): 384
>>>
>>>                 Memtable Columns Count: 117704
>>>
>>>                 Memtable Data Size: 115107307
>>>
>>>                 Memtable Switch Count: 66
>>>
>>>                 Read Count: 0
>>>
>>>                 Read Latency: NaN ms.
>>>
>>>                 Write Count: 21203290
>>>
>>>                 Write Latency: 0.014 ms.
>>>
>>>                 Pending Tasks: 0
>>>
>>>                 Key cache capacity: 3
>>>
>>>                 Key cache size: 0
>>>
>>>                 Key cache hit rate: NaN
>>>
>>>                 Row cache: disabled
>>>
>>>                 Compacted row minimum size: 30130993
>>>
>>>                 Compacted row maximum size: 9223372036854775807
>>>
>>> Exception in thread "main" java.lang.IllegalStateException: Unable 
>>> to compute ceiling for max when histogram overflowed
>>>
>>>         at
>>> org.apache.cassandra.utils.EstimatedHistogram.mean(EstimatedHistogram.
>>> java:170)
>>>
>>>         at
>>> org.apache.cassandra.db.DataTracker.getMeanRowSize(DataTracker.java:
>>> 3
>>> 9
>>> 5)
>>>
>>>         at
>>> org.apache.cassandra.db.ColumnFamilyStore.getMeanRowSize(ColumnFamil
>>> y
>>> S
>>> tore.java:293)
>>>
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>> Method)
>>>
>>>         at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
>>> j
>>> ava:39)
>>>
>>>         at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcce
>>> s
>>> s
>>> orImpl.java:25)
>>>
>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>
>>>         at
>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardM
>>> B
>>> e
>>> anIntrospector.java:93)
>>>
>>>         at
>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardM
>>> B
>>> e
>>> anIntrospector.java:27)
>>>
>>>         at
>>> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.
>>> j
>>> a
>>> va:208)
>>>
>>>         at
>>> com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:
>>> 6
>>> 5
>>> )
>>>
>>>         at
>>> com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:
>>> 2
>>> 1
>>> 6)
>>>
>>>         at
>>> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(D
>>> e
>>> f
>>> aultMBeanServerInterceptor.java:666)
>>>
>>>         at
>>> com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.j
>>> a
>>> v
>>> a:638)
>>>
>>>         at
>>> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnect
>>> i
>>> o
>>> nImpl.java:1404)
>>>
>>>         at
>>> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnecti
>>> o
>>> n
>>> Impl.java:72)
>>>
>>>         at
>>> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.ru
>>> n
>>> (
>>> RMIConnectionImpl.java:1265)
>>>
>>>         at
>>> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(
>>> R
>>> M
>>> IConnectionImpl.java:1360)
>>>
>>>         at
>>> javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnec
>>> t
>>> i
>>> onImpl.java:600)
>>>
>>>         at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown
>>> Source)
>>>
>>>         at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcce
>>> s
>>> s
>>> orImpl.java:25)
>>>
>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>
>>>         at
>>> sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>>>
>>>         at sun.rmi.transport.Transport$1.run(Transport.java:159)
>>>
>>>         at java.security.AccessController.doPrivileged(Native 
>>> Method)
>>>
>>>         at
>>> sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>>>
>>>         at
>>> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:
>>> 5
>>> 3
>>> 5)
>>>
>>>         at
>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTranspo
>>> r
>>> t
>>> .java:790)
>>>
>>>         at
>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.
>>> java:649)
>>>
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExe
>>> c
>>> u
>>> tor.java:886)
>>>
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>> java:908)
>>>
>>>         at java.lang.Thread.run(Thread.java:662)
>>>
>>>
>>>
>>> Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation | T:
>>> 206.926.1978 | M: 206.849.2477
>>>
>>>
>>>
>>> No virus found in this incoming message.
>>> Checked by AVG - www.avg.com
>>> Version: 9.0.920 / Virus Database: 271.1.1/3993 - Release Date:
>>> 11/03/11
>>> 03:39:00
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support 
>> http://www.datastax.com
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support 
> http://www.datastax.com
>



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com

Mime
View raw message