cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yan Chunlu <springri...@gmail.com>
Subject Re: node restart taking too long
Date Sun, 21 Aug 2011 08:58:54 GMT
actually I didn't dropped any CF,  maybe my understanding was totally wrong,
I just describe what I thought as belows:

I thought by "deleted CFs" means the sstable that useless(since "node
repair" and could copy data to another node,  the original sstable might be
deleted but not yet).  when I deleted all migration and schema sstables, it
somehow "forgot" those files should be deleted, so it read the file and "can
not find cfId"...


I got to this situation by the following steps: at first I did "node repair"
on node2 which failed in the middle(node3 down), and leave the Load as 170GB
while average is 30GB.

after I brought up node3,  the node2 start up very slow, 4 days past it stil
starting.  it seems loading row cache and key cache.  so I disabled those
cache by set the value to 0 via cassandra-cli. during this procedure, of
course node2 was not reachable so it can not update the schema.

after that node2 could be start very quickly, but the "describe cluster"
shows it was "UNREACHABLE", so I did as the FAQ says, delete schema,
migration sstables and restart node2.

then the "Couldn't find cfId=1000'" error start showing up.





I have just moved those migration && schema sstables back and start
cassandra, it still shows "UNREACHABLE", after wait for couple of hours, the
"describe cluster" shows they are the same version now.


even this problem solved, I am not sure HOW....... really curious that why
just remove "migration* and schema*" sstables could cause  "Couldn't find
cfId=1000'"  error.

On Sun, Aug 21, 2011 at 12:24 PM, Jonathan Ellis <jbellis@gmail.com> wrote:

> I'm not sure what problem you're trying to solve.  The exception you
> pasted should stop once your clients are no longer trying to use the
> dropped CF.
>
> On Sat, Aug 20, 2011 at 10:09 PM, Yan Chunlu <springrider@gmail.com>
> wrote:
> > that could be the reason, I did nodetool repair(unfinished, data size
> > increased 6 times bigger 30G vs 170G) and there should be some unclean
> > sstables on that node.
> > however upgrade it a tough work for me right now.  could the nodetool
> scrub
> > help?  or decommission the node and join it again?
> >
> > On Sun, Aug 21, 2011 at 5:56 AM, Jonathan Ellis <jbellis@gmail.com>
> wrote:
> >>
> >> This means you should upgrade, because we've fixed bugs about ignoring
> >> deleted CFs since 0.7.4.
> >>
> >> On Fri, Aug 19, 2011 at 9:26 AM, Yan Chunlu <springrider@gmail.com>
> wrote:
> >> > the log file shows as follows, not sure what does 'Couldn't find
> >> > cfId=1000'
> >> > means(google just returned useless results):
> >> >
> >> > INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453)
> >> > Found
> >> > table data in data directories. Consider using JMX to call
> >> > org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
> >> >  INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line 50)
> >> > Creating new commitlog segment
> >> > /cassandra/commitlog/CommitLog-1313670197705.log
> >> >  INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155)
> Replaying
> >> > /cassandra/commitlog/CommitLog-1313670030512.log
> >> >  INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314)
> Finished
> >> > reading /cassandra/commitlog/CommitLog-1313670030512.log
> >> >  INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) Log
> >> > replay
> >> > complete
> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 364)
> >> > Cassandra version: 0.7.4
> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 365)
> >> > Thrift
> >> > API version: 19.4.0
> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 378)
> >> > Loading
> >> > persisted ring state
> >> >  INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line 414)
> >> > Starting
> >> > up server gossip
> >> >  INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line
> 1048)
> >> > Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1
> >> > operations)
> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line 157)
> >> > Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations)
> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line 164)
> >> > Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db
> (80
> >> > bytes)
> >> >  INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823
> >> > CompactionManager.java
> >> > (line 396) Compacting
> >> >
> >> >
> [SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')]
> >> >  INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line 478)
> >> > Using
> >> > saved token 113427455640312821154458202477256070484
> >> >  INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line
> 1048)
> >> > Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2
> >> > operations)
> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line 157)
> >> > Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations)
> >> > ERROR [MutationStage:28] 2011-08-18 07:23:18,246
> >> > RowMutationVerbHandler.java
> >> > (line 86) Error in row mutation
> >> > org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't
> >> > find
> >> > cfId=1000
> >> >     at
> >> >
> >> >
> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117)
> >> >     at
> >> >
> >> >
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380)
> >> >     at
> >> >
> >> >
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50)
> >> >     at
> >> >
> >> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
> >> >     at
> >> >
> >> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >> >     at
> >> >
> >> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >> >     at java.lang.Thread.run(Thread.java:636)
> >> >  INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line 623)
> >> > Node
> >> > /node1 has restarted, now UP again
> >> > ERROR [ReadStage:1] 2011-08-18 07:23:18,254
> >> > DebuggableThreadPoolExecutor.java (line 103) Error in
> ThreadPoolExecutor
> >> > java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache in
> >> > keyspace prjkeyspace
> >> >     at
> >> >
> >> >
> org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966)
> >> >     at
> >> >
> >> >
> org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:388)
> >> >     at
> >> > org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93)
> >> >     at
> >> >
> >> >
> org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44)
> >> >     at
> >> >
> >> >
> org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:110)
> >> >     at
> >> >
> >> >
> org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:122)
> >> >     at
> >> >
> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67)
> >> >
> >> >
> >> > On Fri, Aug 19, 2011 at 5:44 AM, aaron morton <
> aaron@thelastpickle.com>
> >> > wrote:
> >> >>
> >> >> Look in the logs to work find out why the migration did not get to
> >> >> node2.
> >> >> Otherwise yes you can drop those files.
> >> >> Cheers
> >> >> -----------------
> >> >> Aaron Morton
> >> >> Freelance Cassandra Developer
> >> >> @aaronmorton
> >> >> http://www.thelastpickle.com
> >> >> On 18/08/2011, at 11:25 PM, Yan Chunlu wrote:
> >> >>
> >> >> just found out that changes via cassandra-cli, the schema change
> didn't
> >> >> reach node2. and node2 became unreachable....
> >> >> I did as this
> >> >> document:http://wiki.apache.org/cassandra/FAQ#schema_disagreement
> >> >> but after that I just got two schema versons:
> >> >>
> >> >>
> >> >> ddcada52-c96a-11e0-99af-3bd951658d61: [node1, node3]
> >> >> 2127b2ef-6998-11e0-b45b-3bd951658d61: [node2]
> >> >>
> >> >> is that enough delete Schema* && Migrations* sstables and restart
the
> >> >> node?
> >> >>
> >> >>
> >> >> On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu <springrider@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> thanks a lot for  all the help!  I have gone through the steps
and
> >> >>> successfully brought up the node2 :)
> >> >>>
> >> >>> On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen <yulinyen@gmail.com>
> >> >>> wrote:
> >> >>> > Because the file only preserve the "key" of records, not the
whole
> >> >>> > record.
> >> >>> > Records for those saved key will be loaded into cassandra
during
> the
> >> >>> > startup
> >> >>> > of cassandra.
> >> >>> >
> >> >>> > On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu <
> springrider@gmail.com>
> >> >>> > wrote:
> >> >>> >>
> >> >>> >> but the data size in the saved_cache are relatively small:
> >> >>> >>
> >> >>> >> will that cause the load problem?
> >> >>> >>
> >> >>> >>  ls  -lh  /cassandra/saved_caches/
> >> >>> >> total 32M
> >> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53
> >> >>> >> cass-CommentSortsCache-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29
> >> >>> >> cass-CommentSortsCache-RowCache
> >> >>> >> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50
> >> >>> >> cass-CommentVote-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53
> >> >>> >> cass-device_images-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 cass-images-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53
> >> >>> >> cass-LinksByUrl-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50
> cass-LinkVote-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache
> >> >>> >> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50
> >> >>> >> cass-SavesByAccount-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass  864 2011-08-12 19:49
> >> >>> >> cass-VotesByDay-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49
> >> >>> >> cass-VotesByLink-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass   28 2011-08-14 12:50
> >> >>> >> system-HintsColumnFamily-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass    5 2011-08-14 12:50
> >> >>> >> system-LocationInfo-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass   54 2011-08-13 13:30
> >> >>> >> system-Migrations-KeyCache
> >> >>> >> -rw-r--r-- 1 cass cass   76 2011-08-13 13:30
> system-Schema-KeyCache
> >> >>> >>
> >> >>> >> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton
> >> >>> >> <aaron@thelastpickle.com>
> >> >>> >> wrote:
> >> >>> >> > If you have a node that cannot start up due to issues
loading
> the
> >> >>> >> > saved
> >> >>> >> > cache delete the files in the saved_cache directory
before
> >> >>> >> > starting
> >> >>> >> > it.
> >> >>> >> >
> >> >>> >> > The settings to save the row and key cache are per
CF. You can
> >> >>> >> > change
> >> >>> >> > them with an update column family statement via the
CLI when
> >> >>> >> > attached to any
> >> >>> >> > node. You may then want to check the saved_caches
directory and
> >> >>> >> > delete any
> >> >>> >> > files that are left (not sure if they are automatically
> deleted).
> >> >>> >> >
> >> >>> >> > i would recommend:
> >> >>> >> > - stop node 2
> >> >>> >> > - delete it's saved_cache
> >> >>> >> > - make the schema change via another node
> >> >>> >> > - startup node 2
> >> >>> >> >
> >> >>> >> > Cheers
> >> >>> >> >
> >> >>> >> > -----------------
> >> >>> >> > Aaron Morton
> >> >>> >> > Freelance Cassandra Developer
> >> >>> >> > @aaronmorton
> >> >>> >> > http://www.thelastpickle.com
> >> >>> >> >
> >> >>> >> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
> >> >>> >> >
> >> >>> >> >> does this need to be cluster wide? or I could
just modify the
> >> >>> >> >> caches
> >> >>> >> >> on one node?   since I could not connect to the
node with
> >> >>> >> >> cassandra-cli, it says "connection refused"
> >> >>> >> >>
> >> >>> >> >>
> >> >>> >> >> [default@unknown] connect node2/9160;
> >> >>> >> >> Exception connecting to node2/9160. Reason: Connection
> refused.
> >> >>> >> >>
> >> >>> >> >>
> >> >>> >> >> so if I change the cache size via other nodes,
how could node2
> >> >>> >> >> be
> >> >>> >> >> notified the changing?    kill cassandra and
start it again
> >> >>> >> >> could
> >> >>> >> >> make
> >> >>> >> >> it update the schema?
> >> >>> >> >>
> >> >>> >> >>
> >> >>> >> >>
> >> >>> >> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer
> >> >>> >> >> <tholzer@wetafx.co.nz>
> >> >>> >> >> wrote:
> >> >>> >> >>> Hi,
> >> >>> >> >>>
> >> >>> >> >>> yes, we saw exactly the same messages. We
got rid of these by
> >> >>> >> >>> doing
> >> >>> >> >>> the
> >> >>> >> >>> following:
> >> >>> >> >>>
> >> >>> >> >>> * Set all row & key caches in your CFs
to 0 via cassandra-cli
> >> >>> >> >>> * Kill Cassandra
> >> >>> >> >>> * Remove all files in the saved_caches directory
> >> >>> >> >>> * Start Cassandra
> >> >>> >> >>> * Slowly bring back row & key caches
(if desired, we left
> them
> >> >>> >> >>> off)
> >> >>> >> >>>
> >> >>> >> >>> Cheers,
> >> >>> >> >>>
> >> >>> >> >>>        T.
> >> >>> >> >>>
> >> >>> >> >>> On 16/08/11 23:35, Yan Chunlu wrote:
> >> >>> >> >>>>
> >> >>> >> >>>>  I saw alot slicequeryfilter things if
changed the log level
> >> >>> >> >>>> to
> >> >>> >> >>>> DEBUG.
> >> >>> >> >>>>  just
> >> >>> >> >>>> thought even bring up a new node will
be faster than start
> the
> >> >>> >> >>>> old
> >> >>> >> >>>> one..... it
> >> >>> >> >>>> is wired
> >> >>> >> >>>>
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,213
SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:225@1313068845474382
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,245
SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:453@1310999270198313
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,251
SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:26@1313199902088827
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,576
SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:157@1313097239332314
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,674
SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:41729@1313190821826229
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,811
SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:6@1313174157301203
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,867
SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:98@1312011362250907
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,881
SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:42@1313201711997005
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,910
SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:96@1312939986190155
> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,954
SliceQueryFilter.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 123)
> >> >>> >> >>>> collecting 0 of 2147483647:
> >> >>> >> >>>> 76616c7565:false:621@1313192538616112
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan
Chunlu
> >> >>> >> >>>> <springrider@gmail.com
> >> >>> >> >>>> <mailto:springrider@gmail.com>>
wrote:
> >> >>> >> >>>>
> >> >>> >> >>>>    but it seems the row cache is cluster
wide, how will  the
> >> >>> >> >>>> change
> >> >>> >> >>>> of row
> >> >>> >> >>>>    cache affect the read speed?
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>    On Mon, Aug 15, 2011 at 7:33 AM, Jonathan
Ellis
> >> >>> >> >>>> <jbellis@gmail.com
> >> >>> >> >>>>    <mailto:jbellis@gmail.com>>
wrote:
> >> >>> >> >>>>
> >> >>> >> >>>>        Or leave row cache enabled but
disable cache saving
> >> >>> >> >>>> (and
> >> >>> >> >>>> remove the
> >> >>> >> >>>>        one already on disk).
> >> >>> >> >>>>
> >> >>> >> >>>>        On Sun, Aug 14, 2011 at 5:05 PM,
aaron morton
> >> >>> >> >>>> <aaron@thelastpickle.com
> >> >>> >> >>>>        <mailto:aaron@thelastpickle.com>>
wrote:
> >> >>> >> >>>>         >  INFO [main] 2011-08-14
09:24:52,198
> >> >>> >> >>>> ColumnFamilyStore.java
> >> >>> >> >>>> (line 547)
> >> >>> >> >>>>         > completed loading (1744370
ms; 200000 keys) row
> >> >>> >> >>>> cache
> >> >>> >> >>>> for
> >> >>> >> >>>> COMMENT
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > It's taking 29 minutes to
load 200,000 rows in the
> >> >>> >> >>>>  row
> >> >>> >> >>>> cache.
> >> >>> >> >>>> Thats a
> >> >>> >> >>>>         > pretty big row cache, I
would suggest reducing or
> >> >>> >> >>>> disabling
> >> >>> >> >>>> it.
> >> >>> >> >>>>         > Background
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > and server can not afford
the load then crashed.
> >> >>> >> >>>> after
> >> >>> >> >>>> come
> >> >>> >> >>>> back,
> >> >>> >> >>>>        node 3 can
> >> >>> >> >>>>         > not return for more than
96 hours
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > Crashed how ?
> >> >>> >> >>>>         > You may be seeing
> >> >>> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280
> >> >>> >> >>>>         > Watch nodetool compactionstats
to see when the
> >> >>> >> >>>> Merkle
> >> >>> >> >>>> tree
> >> >>> >> >>>> build
> >> >>> >> >>>>        finishes
> >> >>> >> >>>>         > and nodetool netstats to
see which CF's are
> >> >>> >> >>>> streaming.
> >> >>> >> >>>>         > Cheers
> >> >>> >> >>>>         > -----------------
> >> >>> >> >>>>         > Aaron Morton
> >> >>> >> >>>>         > Freelance Cassandra Developer
> >> >>> >> >>>>         > @aaronmorton
> >> >>> >> >>>>         > http://www.thelastpickle.com
> >> >>> >> >>>>         > On 15 Aug 2011, at 04:23,
Yan Chunlu wrote:
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > I got 3 nodes and RF=3,
when I repairing ndoe3, it
> >> >>> >> >>>> seems
> >> >>> >> >>>> alot
> >> >>> >> >>>> data
> >> >>> >> >>>>         > generated.  and server can
not afford the load
> then
> >> >>> >> >>>> crashed.
> >> >>> >> >>>>         > after come back, node 3
can not return for more
> than
> >> >>> >> >>>> 96
> >> >>> >> >>>> hours
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > for 34GB data, the node
2 could restart and back
> >> >>> >> >>>> online
> >> >>> >> >>>> within 1
> >> >>> >> >>>> hour.
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > I am not sure what's wrong
with node3 and should I
> >> >>> >> >>>> restart
> >> >>> >> >>>> node
> >> >>> >> >>>> 3 again?
> >> >>> >> >>>>         > thanks!
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > Address         Status State
  Load
>  Owns
> >> >>> >> >>>>  Token
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > 113427455640312821154458202477256070484
> >> >>> >> >>>>         > node1     Up     Normal
 34.11 GB        33.33%  0
> >> >>> >> >>>>         > node2     Up     Normal
 31.44 GB        33.33%
> >> >>> >> >>>>         > 56713727820156410577229101238628035242
> >> >>> >> >>>>         > node3     Down   Normal
 177.55 GB       33.33%
> >> >>> >> >>>>         > 113427455640312821154458202477256070484
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         > the log shows it is still
going on, not sure why
> it
> >> >>> >> >>>> is
> >> >>> >> >>>> so
> >> >>> >> >>>> slow:
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >  INFO [main] 2011-08-14
08:55:47,734
> >> >>> >> >>>> SSTableReader.java
> >> >>> >> >>>> (line
> >> >>> >> >>>> 154)
> >> >>> >> >>>>        Opening
> >> >>> >> >>>>         > /cassandra/data/COMMENT
> >> >>> >> >>>>         >  INFO [main] 2011-08-14
08:55:47,828
> >> >>> >> >>>> ColumnFamilyStore.java
> >> >>> >> >>>> (line 275)
> >> >>> >> >>>>         > reading saved cache
> >> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
> >> >>> >> >>>>         >  INFO [main] 2011-08-14
09:24:52,198
> >> >>> >> >>>> ColumnFamilyStore.java
> >> >>> >> >>>> (line 547)
> >> >>> >> >>>>         > completed loading (1744370
ms; 200000 keys) row
> >> >>> >> >>>> cache
> >> >>> >> >>>> for
> >> >>> >> >>>> COMMENT
> >> >>> >> >>>>         >  INFO [main] 2011-08-14
09:24:52,299
> >> >>> >> >>>> ColumnFamilyStore.java
> >> >>> >> >>>> (line 275)
> >> >>> >> >>>>         > reading saved cache
> >> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
> >> >>> >> >>>>         >  INFO [CompactionExecutor:1]
2011-08-14
> 10:24:55,480
> >> >>> >> >>>>        CacheWriter.java (line
> >> >>> >> >>>>         > 96) Saved COMMENT-RowCache
(200000 items) in 2535
> ms
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>         >
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>        --
> >> >>> >> >>>>        Jonathan Ellis
> >> >>> >> >>>>        Project Chair, Apache Cassandra
> >> >>> >> >>>>        co-founder of DataStax, the source
for professional
> >> >>> >> >>>> Cassandra
> >> >>> >> >>>> support
> >> >>> >> >>>>        http://www.datastax.com
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>>
> >> >>> >> >>>
> >> >>> >> >>>
> >> >>> >> >
> >> >>> >> >
> >> >>> >
> >> >>> >
> >> >>>
> >> >>
> >> >>
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of DataStax, the source for professional Cassandra support
> >> http://www.datastax.com
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Mime
View raw message