cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: Repair in Cassandra 0.8.4 taking too long
Date Sun, 02 Oct 2011 23:12:44 GMT
What version are you on ?

The error stack is from nodetool talking to the server. Check the logs on node 3 in DC2 for
errors, it sounds like perhaps it to repair or did not complete. 

You can monitor a repair by looking at:
- nodetool compactionstats for a validation compaction
- nodetool netstats for data transfers

I would restart node 3 in dc2 as it may now how 2 repairs running. Then start the repair again
and monitor it using the tools above. 

I'm not sure how many CF's you have but 2GB is not a lot of memory for the Heap, you may want
to increase it. Also by default the key cache is enabled and set to 200k entries. 

Hope that helps. 

Aaron Morton
Freelance Cassandra Developer

On 2/10/2011, at 6:24 AM, Raj N wrote:

> I had 3 nodes with strategy_options (DC1=3) in 1 DC. I added 1 more DC and 3 more nodes.
I didnt set the initial token. But I ran nodetool move on the new nodes(adding 1 to the tokens
of the nodes in DC1) . I updated the keyspace to strategy_options (DC1=3, DC2=3). Then I started
running nodetool repair on each of the nodes. Before I started repair each node had around
5 GB of data. I started on the new nodes. 2 of the nodes completed the repair in 2 hours each.
During the repair I saw the data to grow to almost 25 GB, but eventually when the repair was
done the data settled at around 9 GB. Is this normal? The 3rd node has been running repair
for a long time. It eventually stopped throwing an exception -
> Exception in thread "main" java.rmi.UnmarshalException: Error unmarshaling return header;
nested exception is:
>         at sun.rmi.transport.StreamRemoteCall.executeCall(
>         at sun.rmi.server.UnicastRef.invoke(
>         at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)
>         at Source)
>         at$RemoteMBeanServerConnection.invoke(
>         at
>         at $Proxy0.forceTableRepair(Unknown Source)
>         at
>         at
>         at
> Caused by:
>         at
>         at sun.rmi.transport.StreamRemoteCall.executeCall(
> I started repair again since its safe to do so. Now the GCInspector complains of not
enough heap -
> WARN [ScheduledTasks:1] 2011-10-01 13:08:16,227 (line 149) Heap is 0.7598414264960864
full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to
the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold
in cassandra.yaml if you don't want Cassandra to do this automatically
>  INFO [ScheduledTasks:1] 2011-10-01 13:08:16,227 (line 2398) Unable
to reduce heap usage since there are no dirty column families
> nodetool ring shows 48GB of data on the node. 
> My Xmx is 2G. I rely on OS caching more than Row caching or key caching. Hence the column
families are created with default settings.
> Any help would be appreciated.
> Thanks
> -Raj

View raw message