incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Repair in Cassandra 0.8.4 taking too long
Date Sun, 02 Oct 2011 23:12:44 GMT
What version are you on ?

The error stack is from nodetool talking to the server. Check the logs on node 3 in DC2 for
errors, it sounds like perhaps it to repair or did not complete. 

You can monitor a repair by looking at:
- nodetool compactionstats for a validation compaction
- nodetool netstats for data transfers

I would restart node 3 in dc2 as it may now how 2 repairs running. Then start the repair again
and monitor it using the tools above. 

I'm not sure how many CF's you have but 2GB is not a lot of memory for the Heap, you may want
to increase it. Also by default the key cache is enabled and set to 200k entries. 

Hope that helps. 


-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 2/10/2011, at 6:24 AM, Raj N wrote:

> I had 3 nodes with strategy_options (DC1=3) in 1 DC. I added 1 more DC and 3 more nodes.
I didnt set the initial token. But I ran nodetool move on the new nodes(adding 1 to the tokens
of the nodes in DC1) . I updated the keyspace to strategy_options (DC1=3, DC2=3). Then I started
running nodetool repair on each of the nodes. Before I started repair each node had around
5 GB of data. I started on the new nodes. 2 of the nodes completed the repair in 2 hours each.
During the repair I saw the data to grow to almost 25 GB, but eventually when the repair was
done the data settled at around 9 GB. Is this normal? The 3rd node has been running repair
for a long time. It eventually stopped throwing an exception -
> Exception in thread "main" java.rmi.UnmarshalException: Error unmarshaling return header;
nested exception is:
>         java.io.EOFException
>         at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:209)
>         at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:142)
>         at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)
>         at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown Source)
>         at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:993)
>         at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:288)
>         at $Proxy0.forceTableRepair(Unknown Source)
>         at org.apache.cassandra.tools.NodeProbe.forceTableRepair(NodeProbe.java:192)
>         at org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:773)
>         at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:669)
> Caused by: java.io.EOFException
>         at java.io.DataInputStream.readByte(DataInputStream.java:250)
>         at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:195)
> 
> I started repair again since its safe to do so. Now the GCInspector complains of not
enough heap -
> WARN [ScheduledTasks:1] 2011-10-01 13:08:16,227 GCInspector.java (line 149) Heap is 0.7598414264960864
full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to
the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold
in cassandra.yaml if you don't want Cassandra to do this automatically
>  INFO [ScheduledTasks:1] 2011-10-01 13:08:16,227 StorageService.java (line 2398) Unable
to reduce heap usage since there are no dirty column families
> 
> nodetool ring shows 48GB of data on the node. 
> 
> My Xmx is 2G. I rely on OS caching more than Row caching or key caching. Hence the column
families are created with default settings.
> 
> Any help would be appreciated.
> 
> Thanks
> -Raj


Mime
View raw message