cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: node restart taking too long
Date Sun, 21 Aug 2011 11:57:23 GMT
I've seen "Couldn't find cfId=1000" in a mutation stage happen when a node joins a cluster
with existing data after having it's schema cleared. 

The migrations received from another node are applied one CF at a time, when each CF is added
the node will open the existing data files which can take a while. In the mean time it's joined
on gossip and is receiving mutations from other nodes that have all the CF's. One the returning
node gets through applying the migration the errors should stop. 

Read is a similar story.

Cheers
 


-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 21/08/2011, at 8:58 PM, Yan Chunlu wrote:

> actually I didn't dropped any CF,  maybe my understanding was totally wrong, I just describe
what I thought as belows: 
> 
> I thought by "deleted CFs" means the sstable that useless(since "node repair" and could
copy data to another node,  the original sstable might be deleted but not yet).  when I deleted
all migration and schema sstables, it somehow "forgot" those files should be deleted, so it
read the file and "can not find cfId"...
> 
> 
> I got to this situation by the following steps: at first I did "node repair" on node2
which failed in the middle(node3 down), and leave the Load as 170GB while average is 30GB.
> 
> after I brought up node3,  the node2 start up very slow, 4 days past it stil starting.
 it seems loading row cache and key cache.  so I disabled those cache by set the value to
0 via cassandra-cli. during this procedure, of course node2 was not reachable so it can not
update the schema.
> 
> after that node2 could be start very quickly, but the "describe cluster" shows it was
"UNREACHABLE", so I did as the FAQ says, delete schema, migration sstables and restart node2.

> 
> then the "Couldn't find cfId=1000'" error start showing up.
> 
> 
> 
> 
> 
> I have just moved those migration && schema sstables back and start cassandra,
it still shows "UNREACHABLE", after wait for couple of hours, the "describe cluster" shows
they are the same version now.
> 
> 
> even this problem solved, I am not sure HOW....... really curious that why just remove
"migration* and schema*" sstables could cause  "Couldn't find cfId=1000'"  error.
> 
> On Sun, Aug 21, 2011 at 12:24 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
> I'm not sure what problem you're trying to solve.  The exception you
> pasted should stop once your clients are no longer trying to use the
> dropped CF.
> 
> On Sat, Aug 20, 2011 at 10:09 PM, Yan Chunlu <springrider@gmail.com> wrote:
> > that could be the reason, I did nodetool repair(unfinished, data size
> > increased 6 times bigger 30G vs 170G) and there should be some unclean
> > sstables on that node.
> > however upgrade it a tough work for me right now.  could the nodetool scrub
> > help?  or decommission the node and join it again?
> >
> > On Sun, Aug 21, 2011 at 5:56 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
> >>
> >> This means you should upgrade, because we've fixed bugs about ignoring
> >> deleted CFs since 0.7.4.
> >>
> >> On Fri, Aug 19, 2011 at 9:26 AM, Yan Chunlu <springrider@gmail.com> wrote:
> >> > the log file shows as follows, not sure what does 'Couldn't find
> >> > cfId=1000'
> >> > means(google just returned useless results):
> >> >
> >> > INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453)
> >> > Found
> >> > table data in data directories. Consider using JMX to call
> >> > org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
> >> >  INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line 50)
> >> > Creating new commitlog segment
> >> > /cassandra/commitlog/CommitLog-1313670197705.log
> >> >  INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155) Replaying
> >> > /cassandra/commitlog/CommitLog-1313670030512.log
> >> >  INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314) Finished
> >> > reading /cassandra/commitlog/CommitLog-1313670030512.log
> >> >  INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) Log
> >> > replay
> >> > complete
> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 364)
> >> > Cassandra version: 0.7.4
> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 365)
> >> > Thrift
> >> > API version: 19.4.0
> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 378)
> >> > Loading
> >> > persisted ring state
> >> >  INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line 414)
> >> > Starting
> >> > up server gossip
> >> >  INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line 1048)
> >> > Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1
> >> > operations)
> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line 157)
> >> > Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations)
> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line 164)
> >> > Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db (80
> >> > bytes)
> >> >  INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823
> >> > CompactionManager.java
> >> > (line 396) Compacting
> >> >
> >> > [SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')]
> >> >  INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line 478)
> >> > Using
> >> > saved token 113427455640312821154458202477256070484
> >> >  INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line 1048)
> >> > Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2
> >> > operations)
> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line 157)
> >> > Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations)
> >> > ERROR [MutationStage:28] 2011-08-18 07:23:18,246
> >> > RowMutationVerbHandler.java
> >> > (line 86) Error in row mutation
> >> > org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't
> >> > find
> >> > cfId=1000
> >> >     at
> >> >
> >> > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117)
> >> >     at
> >> >
> >> > org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380)
> >> >     at
> >> >
> >> > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50)
> >> >     at
> >> >
> >> > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
> >> >     at
> >> >
> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >> >     at
> >> >
> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >> >     at java.lang.Thread.run(Thread.java:636)
> >> >  INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line 623)
> >> > Node
> >> > /node1 has restarted, now UP again
> >> > ERROR [ReadStage:1] 2011-08-18 07:23:18,254
> >> > DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> >> > java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache in
> >> > keyspace prjkeyspace
> >> >     at
> >> >
> >> > org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966)
> >> >     at
> >> >
> >> > org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:388)
> >> >     at
> >> > org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93)
> >> >     at
> >> >
> >> > org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44)
> >> >     at
> >> >
> >> > org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:110)
> >> >     at
> >> >
> >> > org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:122)
> >> >     at
> >> > org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67)
> >> >
> >> >
> >> > On Fri, Aug 19, 2011 at 5:44 AM, aaron morton <aaron@thelastpickle.com>
> >> > wrote:
> >> >>
> >> >> Look in the logs to work find out why the migration did not get to
> >> >> node2.
> >> >> Otherwise yes you can drop those files.
> >> >> Cheers
> >> >> -----------------
> >> >> Aaron Morton
> >> >> Freelance Cassandra Developer
> >> >> @aaronmorton
> >> >> http://www.thelastpickle.com
> >> >> On 18/08/2011, at 11:25 PM, Yan Chunlu wrote:
> >> >>
> >> >> just found out that changes via cassandra-cli, the schema change didn't
> >> >> reach node2. and node2 became unreachable....
> >> >> I did as this
> >> >> document:http://wiki.apache.org/cassandra/FAQ#schema_disagreement
> >> >> but after that I just got two schema versons:
> >> >>
> >> >>
> >> >> ddcada52-c96a-11e0-99af-3bd951658d61: [node1, node3]
> >> >> 2127b2ef-6998-11e0-b45b-3bd951658d61: [node2]
> >> >>
> >> >> is that enough delete Schema* && Migrations* sstables and restart
the
> >> >> node?
> >> >>
> >> >>
> >> >> On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu <springrider@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> thanks a lot for  all the help!  I have gone through the steps
and
> >> >>> successfully brought up the node2 :)
> >> >>>
> >> >>> On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen <yulinyen@gmail.com>
> >> >>> wrote:
> >> >>> > Because the file only preserve the "key" of records, not the
whole
> >> >>> > record.
> >> >>> > Records for those saved key will be loaded into cassandra
during the
> >> >>> > startup
> >> >>> > of cassandra.
> >> >>> >
> >> >>> > On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu <springrider@gmail.com>
> >> >>> > wrote:
> >> >>> >>
> >> >>> >> but the data size in the saved_cache are relatively small:
> >> >>> >>
> >> >>> >> will that cause the load problem?
> >> >>> >>
> >> >>> >>  ls  -lh  /cassandra/saved_caches/
> >> >>> >> total 32M
> >> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53
> >> >>> >> cass-CommentSortsCache-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29
> >> >>> >> cass-CommentSortsCache-RowCache
> >> >>> >> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50
> >> >>> >> cass-CommentVote-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53
> >> >>> >> cass-device_images-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 cass-images-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53
> >> >>> >> cass-LinksByUrl-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50 cass-LinkVote-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache
> >> >>> >> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50
> >> >>> >> cass-SavesByAccount-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass  864 2011-08-12 19:49
> >> >>> >> cass-VotesByDay-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49
> >> >>> >> cass-VotesByLink-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass   28 2011-08-14 12:50
> >> >>> >> system-HintsColumnFamily-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass    5 2011-08-14 12:50
> >> >>> >> system-LocationInfo-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass   54 2011-08-13 13:30
> >> >>> >> system-Migrations-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass   76 2011-08-13 13:30 system-Schema-KeyCache
> >> >>> >>
> >> >>> >> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton
> >> >>> >> <aaron@thelastpickle.com>
> >> >>> >> wrote:
> >> >>> >> > If you have a node that cannot start up due to issues
loading the
> >> >>> >> > saved
> >> >>> >> > cache delete the files in the saved_cache directory
before
> >> >>> >> > starting
> >> >>> >> > it.
> >> >>> >> >
> >> >>> >> > The settings to save the row and key cache are per
CF. You can
> >> >>> >> > change
> >> >>> >> > them with an update column family statement via the
CLI when
> >> >>> >> > attached to any
> >> >>> >> > node. You may then want to check the saved_caches
directory and
> >> >>> >> > delete any
> >> >>> >> > files that are left (not sure if they are automatically
deleted).
> >> >>> >> >
> >> >>> >> > i would recommend:
> >> >>> >> > - stop node 2
> >> >>> >> > - delete it's saved_cache
> >> >>> >> > - make the schema change via another node
> >> >>> >> > - startup node 2
> >> >>> >> >
> >> >>> >> > Cheers
> >> >>> >> >
> >> >>> >> > -----------------
> >> >>> >> > Aaron Morton
> >> >>> >> > Freelance Cassandra Developer
> >> >>> >> > @aaronmorton
> >> >>> >> > http://www.thelastpickle.com
> >> >>> >> >
> >> >>> >> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
> >> >>> >> >
> >> >>> >> >> does this need to be cluster wide? or I could
just modify the
> >> >>> >> >> caches
> >> >>> >> >> on one node?   since I could not connect to the
node with
> >> >>> >> >> cassandra-cli, it says "connection refused"
> >> >>> >> >>
> >> >>> >> >>
> >> >>> >> >> [default@unknown] connect node2/9160;
> >> >>> >> >> Exception connecting to node2/9160. Reason: Connection
refused.
> >> >>> >> >>
> >> >>> >> >>
> >> >>> >> >> so if I change the cache size via other nodes,
how could node2
> >> >>> >> >> be
> >> >>> >> >> notified the changing?    kill cassandra and
start it again
> >> >>> >> >> could
> >> >>> >> >> make
> >> >>> >> >> it update the schema?
> >> >>> >> >>
> >> >>> >> >>
> >> >>> >> >>
> >> >>> >> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer
> >> >>> >> >> <tholzer@wetafx.co.nz>
> >> >>> >> >> wrote:
> >> >>> >> >>> Hi,
> >> >>> >> >>>
> >> >>> >> >>> yes, we saw exactly the same messages. We
got rid of these by
> >> >>> >> >>> doing
> >> >>> >> >>> the
> >> >>> >> >>> following:
> >> >>> >> >>>
> >> >>> >> >>> * Set all row & key caches in your CFs
to 0 via cassandra-cli
> >> >>> >> >>> * Kill Cassandra
> >> >>> >> >>> * Remove all files in the saved_caches directory
> >> >>> >> >>> * Start Cassandra
> >> >>> >> >>> * Slowly bring back row & key caches
(if desired, we left them
> >> >>> >> >>> off)
> >> >>> >> >>>
> >> >>> >> >>> Cheers,
> >> >>> >> >>>
> >> >>> >> >>>        T.
> >> >>> >> >>>
> >> >>> >> >>> On 16/08/11 23:35, Yan Chunlu wrote:
> >> >>> >> >>>>
> >> >>> >> >>>>  I saw alot slicequeryfilter things if
changed the log level
> >> >>> >> >>>> to
> >> >>> >> >>>> DEBUG.
> >> >>> >> >>>>  just
> >> >>> >> >>>> thought even bring up a new node will
be faster than start the
> >> >>> >> >>>> old
> >> >>> >> >>>> one..... it
> >> >>> >> >>>> is wired
> >> >>> >> >>>>
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,213
SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:225@1313068845474382
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,245
SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:453@1310999270198313
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,251
SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:26@1313199902088827
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,576
SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:157@1313097239332314
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,674
SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:41729@1313190821826229
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,811
SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:6@1313174157301203
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,867
SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:98@1312011362250907
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,881
SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:42@1313201711997005
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,910
SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:96@1312939986190155
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,954
SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:621@1313192538616112
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan
Chunlu
> >> >>> >> >>>> <springrider@gmail.com
> >> >>> >> >>>> <mailto:springrider@gmail.com>>
wrote:
> >> >>> >> >>>>
> >> >>> >> >>>>    but it seems the row cache is cluster
wide, how will  the
> >> >>> >> >>>> change
> >> >>> >> >>>> of row
> >> >>> >> >>>>    cache affect the read speed?
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>    On Mon, Aug 15, 2011 at 7:33 AM, Jonathan
Ellis
> >> >>> >> >>>> <jbellis@gmail.com
> >> >>> >> >>>>    <mailto:jbellis@gmail.com>>
wrote:
> >> >>> >> >>>>
> >> >>> >> >>>>        Or leave row cache enabled but
disable cache saving
> >> >>> >> >>>> (and
> >> >>> >> >>>> remove the
> >> >>> >> >>>>        one already on disk).
> >> >>> >> >>>>
> >> >>> >> >>>>        On Sun, Aug 14, 2011 at 5:05 PM,
aaron morton
> >> >>> >> >>>> <aaron@thelastpickle.com
> >> >>> >> >>>>        <mailto:aaron@thelastpickle.com>>
wrote:
> >> >>> >> >>>>         >  INFO [main] 2011-08-14
09:24:52,198
> >> >>> >> >>>> ColumnFamilyStore.java
> >> >>> >> >>>> (line 547)
> >> >>> >> >>>>         > completed loading (1744370
ms; 200000 keys) row
> >> >>> >> >>>> cache
> >> >>> >> >>>> for
> >> >>> >> >>>> COMMENT
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > It's taking 29 minutes to
load 200,000 rows in the
> >> >>> >> >>>>  row
> >> >>> >> >>>> cache.
> >> >>> >> >>>> Thats a
> >> >>> >> >>>>         > pretty big row cache, I
would suggest reducing or
> >> >>> >> >>>> disabling
> >> >>> >> >>>> it.
> >> >>> >> >>>>         > Background
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>  http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > and server can not afford
the load then crashed.
> >> >>> >> >>>> after
> >> >>> >> >>>> come
> >> >>> >> >>>> back,
> >> >>> >> >>>>        node 3 can
> >> >>> >> >>>>         > not return for more than
96 hours
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > Crashed how ?
> >> >>> >> >>>>         > You may be seeing
> >> >>> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280
> >> >>> >> >>>>         > Watch nodetool compactionstats
to see when the
> >> >>> >> >>>> Merkle
> >> >>> >> >>>> tree
> >> >>> >> >>>> build
> >> >>> >> >>>>        finishes
> >> >>> >> >>>>         > and nodetool netstats to
see which CF's are
> >> >>> >> >>>> streaming.
> >> >>> >> >>>>         > Cheers
> >> >>> >> >>>>         > -----------------
> >> >>> >> >>>>         > Aaron Morton
> >> >>> >> >>>>         > Freelance Cassandra Developer
> >> >>> >> >>>>         > @aaronmorton
> >> >>> >> >>>>         > http://www.thelastpickle.com
> >> >>> >> >>>>         > On 15 Aug 2011, at 04:23,
Yan Chunlu wrote:
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > I got 3 nodes and RF=3,
when I repairing ndoe3, it
> >> >>> >> >>>> seems
> >> >>> >> >>>> alot
> >> >>> >> >>>> data
> >> >>> >> >>>>         > generated.  and server can
not afford the load then
> >> >>> >> >>>> crashed.
> >> >>> >> >>>>         > after come back, node 3
can not return for more than
> >> >>> >> >>>> 96
> >> >>> >> >>>> hours
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > for 34GB data, the node
2 could restart and back
> >> >>> >> >>>> online
> >> >>> >> >>>> within 1
> >> >>> >> >>>> hour.
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > I am not sure what's wrong
with node3 and should I
> >> >>> >> >>>> restart
> >> >>> >> >>>> node
> >> >>> >> >>>> 3 again?
> >> >>> >> >>>>         > thanks!
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > Address         Status State
  Load            Owns
> >> >>> >> >>>>  Token
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > 113427455640312821154458202477256070484
> >> >>> >> >>>>         > node1     Up     Normal
 34.11 GB        33.33%  0
> >> >>> >> >>>>         > node2     Up     Normal
 31.44 GB        33.33%
> >> >>> >> >>>>         > 56713727820156410577229101238628035242
> >> >>> >> >>>>         > node3     Down   Normal
 177.55 GB       33.33%
> >> >>> >> >>>>         > 113427455640312821154458202477256070484
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > the log shows it is still
going on, not sure why it
> >> >>> >> >>>> is
> >> >>> >> >>>> so
> >> >>> >> >>>> slow:
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >  INFO [main] 2011-08-14
08:55:47,734
> >> >>> >> >>>> SSTableReader.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 154)
> >> >>> >> >>>>        Opening
> >> >>> >> >>>>         > /cassandra/data/COMMENT
> >> >>> >> >>>>         >  INFO [main] 2011-08-14
08:55:47,828
> >> >>> >> >>>> ColumnFamilyStore.java
> >> >>> >> >>>> (line 275)
> >> >>> >> >>>>         > reading saved cache
> >> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
> >> >>> >> >>>>         >  INFO [main] 2011-08-14
09:24:52,198
> >> >>> >> >>>> ColumnFamilyStore.java
> >> >>> >> >>>> (line 547)
> >> >>> >> >>>>         > completed loading (1744370
ms; 200000 keys) row
> >> >>> >> >>>> cache
> >> >>> >> >>>> for
> >> >>> >> >>>> COMMENT
> >> >>> >> >>>>         >  INFO [main] 2011-08-14
09:24:52,299
> >> >>> >> >>>> ColumnFamilyStore.java
> >> >>> >> >>>> (line 275)
> >> >>> >> >>>>         > reading saved cache
> >> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
> >> >>> >> >>>>         >  INFO [CompactionExecutor:1]
2011-08-14 10:24:55,480
> >> >>> >> >>>>        CacheWriter.java (line
> >> >>> >> >>>>         > 96) Saved COMMENT-RowCache
(200000 items) in 2535 ms
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>        --
> >> >>> >> >>>>        Jonathan Ellis
> >> >>> >> >>>>        Project Chair, Apache Cassandra
> >> >>> >> >>>>        co-founder of DataStax, the source
for professional
> >> >>> >> >>>> Cassandra
> >> >>> >> >>>> support
> >> >>> >> >>>>        http://www.datastax.com
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>
> >> >>> >> >>>
> >> >>> >> >
> >> >>> >> >
> >> >>> >
> >> >>> >
> >> >>>
> >> >>
> >> >>
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of DataStax, the source for professional Cassandra support
> >> http://www.datastax.com
> >
> >
> 
> 
> 
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
> 


Mime
View raw message