cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yan Chunlu <springri...@gmail.com>
Subject Re: node restart taking too long
Date Sun, 21 Aug 2011 13:03:28 GMT
is that means I could just wait and it will be okay eventually?

I also saw the "column family already exists"(not accurate, something like
that) Exception, also caused after I delete the migration and schema
sstables.   but I can not reproduce it, is that a similar problem?

On Sun, Aug 21, 2011 at 7:57 PM, aaron morton <aaron@thelastpickle.com>wrote:

> I've seen "Couldn't find cfId=1000" in a mutation stage happen when a node
> joins a cluster with existing data after having it's schema cleared.
>
> The migrations received from another node are applied one CF at a time,
> when each CF is added the node will open the existing data files which can
> take a while. In the mean time it's joined on gossip and is receiving
> mutations from other nodes that have all the CF's. One the returning node
> gets through applying the migration the errors should stop.
>
> Read is a similar story.
>
> Cheers
>
>
>
>  -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 21/08/2011, at 8:58 PM, Yan Chunlu wrote:
>
> actually I didn't dropped any CF,  maybe my understanding was totally
> wrong, I just describe what I thought as belows:
>
> I thought by "deleted CFs" means the sstable that useless(since "node
> repair" and could copy data to another node,  the original sstable might be
> deleted but not yet).  when I deleted all migration and schema sstables, it
> somehow "forgot" those files should be deleted, so it read the file and "can
> not find cfId"...
>
>
> I got to this situation by the following steps: at first I did "node
> repair" on node2 which failed in the middle(node3 down), and leave the Load
> as 170GB while average is 30GB.
>
> after I brought up node3,  the node2 start up very slow, 4 days past it
> stil starting.  it seems loading row cache and key cache.  so I disabled
> those cache by set the value to 0 via cassandra-cli. during this procedure,
> of course node2 was not reachable so it can not update the schema.
>
> after that node2 could be start very quickly, but the "describe cluster"
> shows it was "UNREACHABLE", so I did as the FAQ says, delete schema,
> migration sstables and restart node2.
>
> then the "Couldn't find cfId=1000'" error start showing up.
>
>
>
>
>
> I have just moved those migration && schema sstables back and start
> cassandra, it still shows "UNREACHABLE", after wait for couple of hours, the
> "describe cluster" shows they are the same version now.
>
>
> even this problem solved, I am not sure HOW....... really curious that why
> just remove "migration* and schema*" sstables could cause  "Couldn't find
> cfId=1000'"  error.
>
> On Sun, Aug 21, 2011 at 12:24 PM, Jonathan Ellis <jbellis@gmail.com>wrote:
>
>> I'm not sure what problem you're trying to solve.  The exception you
>> pasted should stop once your clients are no longer trying to use the
>> dropped CF.
>>
>> On Sat, Aug 20, 2011 at 10:09 PM, Yan Chunlu <springrider@gmail.com>
>> wrote:
>> > that could be the reason, I did nodetool repair(unfinished, data size
>> > increased 6 times bigger 30G vs 170G) and there should be some unclean
>> > sstables on that node.
>> > however upgrade it a tough work for me right now.  could the nodetool
>> scrub
>> > help?  or decommission the node and join it again?
>> >
>> > On Sun, Aug 21, 2011 at 5:56 AM, Jonathan Ellis <jbellis@gmail.com>
>> wrote:
>> >>
>> >> This means you should upgrade, because we've fixed bugs about ignoring
>> >> deleted CFs since 0.7.4.
>> >>
>> >> On Fri, Aug 19, 2011 at 9:26 AM, Yan Chunlu <springrider@gmail.com>
>> wrote:
>> >> > the log file shows as follows, not sure what does 'Couldn't find
>> >> > cfId=1000'
>> >> > means(google just returned useless results):
>> >> >
>> >> > INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line
>> 453)
>> >> > Found
>> >> > table data in data directories. Consider using JMX to call
>> >> > org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
>> >> >  INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line 50)
>> >> > Creating new commitlog segment
>> >> > /cassandra/commitlog/CommitLog-1313670197705.log
>> >> >  INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155)
>> Replaying
>> >> > /cassandra/commitlog/CommitLog-1313670030512.log
>> >> >  INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314)
>> Finished
>> >> > reading /cassandra/commitlog/CommitLog-1313670030512.log
>> >> >  INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) Log
>> >> > replay
>> >> > complete
>> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 364)
>> >> > Cassandra version: 0.7.4
>> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 365)
>> >> > Thrift
>> >> > API version: 19.4.0
>> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 378)
>> >> > Loading
>> >> > persisted ring state
>> >> >  INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line 414)
>> >> > Starting
>> >> > up server gossip
>> >> >  INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line
>> 1048)
>> >> > Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1
>> >> > operations)
>> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line
>> 157)
>> >> > Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations)
>> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line
>> 164)
>> >> > Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db
>> (80
>> >> > bytes)
>> >> >  INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823
>> >> > CompactionManager.java
>> >> > (line 396) Compacting
>> >> >
>> >> >
>> [SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')]
>> >> >  INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line 478)
>> >> > Using
>> >> > saved token 113427455640312821154458202477256070484
>> >> >  INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line
>> 1048)
>> >> > Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2
>> >> > operations)
>> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line
>> 157)
>> >> > Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations)
>> >> > ERROR [MutationStage:28] 2011-08-18 07:23:18,246
>> >> > RowMutationVerbHandler.java
>> >> > (line 86) Error in row mutation
>> >> > org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't
>> >> > find
>> >> > cfId=1000
>> >> >     at
>> >> >
>> >> >
>> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117)
>> >> >     at
>> >> >
>> >> >
>> org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380)
>> >> >     at
>> >> >
>> >> >
>> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50)
>> >> >     at
>> >> >
>> >> >
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
>> >> >     at
>> >> >
>> >> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >> >     at
>> >> >
>> >> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> >> >     at java.lang.Thread.run(Thread.java:636)
>> >> >  INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line
>> 623)
>> >> > Node
>> >> > /node1 has restarted, now UP again
>> >> > ERROR [ReadStage:1] 2011-08-18 07:23:18,254
>> >> > DebuggableThreadPoolExecutor.java (line 103) Error in
>> ThreadPoolExecutor
>> >> > java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache in
>> >> > keyspace prjkeyspace
>> >> >     at
>> >> >
>> >> >
>> org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966)
>> >> >     at
>> >> >
>> >> >
>> org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:388)
>> >> >     at
>> >> >
>> org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93)
>> >> >     at
>> >> >
>> >> >
>> org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44)
>> >> >     at
>> >> >
>> >> >
>> org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:110)
>> >> >     at
>> >> >
>> >> >
>> org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:122)
>> >> >     at
>> >> >
>> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67)
>> >> >
>> >> >
>> >> > On Fri, Aug 19, 2011 at 5:44 AM, aaron morton <
>> aaron@thelastpickle.com>
>> >> > wrote:
>> >> >>
>> >> >> Look in the logs to work find out why the migration did not get
to
>> >> >> node2.
>> >> >> Otherwise yes you can drop those files.
>> >> >> Cheers
>> >> >> -----------------
>> >> >> Aaron Morton
>> >> >> Freelance Cassandra Developer
>> >> >> @aaronmorton
>> >> >> http://www.thelastpickle.com
>> >> >> On 18/08/2011, at 11:25 PM, Yan Chunlu wrote:
>> >> >>
>> >> >> just found out that changes via cassandra-cli, the schema change
>> didn't
>> >> >> reach node2. and node2 became unreachable....
>> >> >> I did as this
>> >> >> document:http://wiki.apache.org/cassandra/FAQ#schema_disagreement
>> >> >> but after that I just got two schema versons:
>> >> >>
>> >> >>
>> >> >> ddcada52-c96a-11e0-99af-3bd951658d61: [node1, node3]
>> >> >> 2127b2ef-6998-11e0-b45b-3bd951658d61: [node2]
>> >> >>
>> >> >> is that enough delete Schema* && Migrations* sstables and
restart
>> the
>> >> >> node?
>> >> >>
>> >> >>
>> >> >> On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu <springrider@gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> thanks a lot for  all the help!  I have gone through the steps
and
>> >> >>> successfully brought up the node2 :)
>> >> >>>
>> >> >>> On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen <yulinyen@gmail.com>
>> >> >>> wrote:
>> >> >>> > Because the file only preserve the "key" of records, not
the
>> whole
>> >> >>> > record.
>> >> >>> > Records for those saved key will be loaded into cassandra
during
>> the
>> >> >>> > startup
>> >> >>> > of cassandra.
>> >> >>> >
>> >> >>> > On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu <
>> springrider@gmail.com>
>> >> >>> > wrote:
>> >> >>> >>
>> >> >>> >> but the data size in the saved_cache are relatively
small:
>> >> >>> >>
>> >> >>> >> will that cause the load problem?
>> >> >>> >>
>> >> >>> >>  ls  -lh  /cassandra/saved_caches/
>> >> >>> >> total 32M
>> >> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53
>> >> >>> >> cass-CommentSortsCache-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29
>> >> >>> >> cass-CommentSortsCache-RowCache
>> >> >>> >> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50
>> >> >>> >> cass-CommentVote-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53
>> >> >>> >> cass-device_images-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53
>> cass-images-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53
>> >> >>> >> cass-LinksByUrl-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50
>> cass-LinkVote-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache
>> >> >>> >> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50
>> >> >>> >> cass-SavesByAccount-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass  864 2011-08-12 19:49
>> >> >>> >> cass-VotesByDay-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49
>> >> >>> >> cass-VotesByLink-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass   28 2011-08-14 12:50
>> >> >>> >> system-HintsColumnFamily-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass    5 2011-08-14 12:50
>> >> >>> >> system-LocationInfo-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass   54 2011-08-13 13:30
>> >> >>> >> system-Migrations-KeyCache
>> >> >>> >> -rw-r--r-- 1 cass cass   76 2011-08-13 13:30
>> system-Schema-KeyCache
>> >> >>> >>
>> >> >>> >> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton
>> >> >>> >> <aaron@thelastpickle.com>
>> >> >>> >> wrote:
>> >> >>> >> > If you have a node that cannot start up due to
issues loading
>> the
>> >> >>> >> > saved
>> >> >>> >> > cache delete the files in the saved_cache directory
before
>> >> >>> >> > starting
>> >> >>> >> > it.
>> >> >>> >> >
>> >> >>> >> > The settings to save the row and key cache are
per CF. You can
>> >> >>> >> > change
>> >> >>> >> > them with an update column family statement via
the CLI when
>> >> >>> >> > attached to any
>> >> >>> >> > node. You may then want to check the saved_caches
directory
>> and
>> >> >>> >> > delete any
>> >> >>> >> > files that are left (not sure if they are automatically
>> deleted).
>> >> >>> >> >
>> >> >>> >> > i would recommend:
>> >> >>> >> > - stop node 2
>> >> >>> >> > - delete it's saved_cache
>> >> >>> >> > - make the schema change via another node
>> >> >>> >> > - startup node 2
>> >> >>> >> >
>> >> >>> >> > Cheers
>> >> >>> >> >
>> >> >>> >> > -----------------
>> >> >>> >> > Aaron Morton
>> >> >>> >> > Freelance Cassandra Developer
>> >> >>> >> > @aaronmorton
>> >> >>> >> > http://www.thelastpickle.com
>> >> >>> >> >
>> >> >>> >> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
>> >> >>> >> >
>> >> >>> >> >> does this need to be cluster wide? or I could
just modify the
>> >> >>> >> >> caches
>> >> >>> >> >> on one node?   since I could not connect
to the node with
>> >> >>> >> >> cassandra-cli, it says "connection refused"
>> >> >>> >> >>
>> >> >>> >> >>
>> >> >>> >> >> [default@unknown] connect node2/9160;
>> >> >>> >> >> Exception connecting to node2/9160. Reason:
Connection
>> refused.
>> >> >>> >> >>
>> >> >>> >> >>
>> >> >>> >> >> so if I change the cache size via other nodes,
how could
>> node2
>> >> >>> >> >> be
>> >> >>> >> >> notified the changing?    kill cassandra
and start it again
>> >> >>> >> >> could
>> >> >>> >> >> make
>> >> >>> >> >> it update the schema?
>> >> >>> >> >>
>> >> >>> >> >>
>> >> >>> >> >>
>> >> >>> >> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer
>> >> >>> >> >> <tholzer@wetafx.co.nz>
>> >> >>> >> >> wrote:
>> >> >>> >> >>> Hi,
>> >> >>> >> >>>
>> >> >>> >> >>> yes, we saw exactly the same messages.
We got rid of these
>> by
>> >> >>> >> >>> doing
>> >> >>> >> >>> the
>> >> >>> >> >>> following:
>> >> >>> >> >>>
>> >> >>> >> >>> * Set all row & key caches in your
CFs to 0 via
>> cassandra-cli
>> >> >>> >> >>> * Kill Cassandra
>> >> >>> >> >>> * Remove all files in the saved_caches
directory
>> >> >>> >> >>> * Start Cassandra
>> >> >>> >> >>> * Slowly bring back row & key caches
(if desired, we left
>> them
>> >> >>> >> >>> off)
>> >> >>> >> >>>
>> >> >>> >> >>> Cheers,
>> >> >>> >> >>>
>> >> >>> >> >>>        T.
>> >> >>> >> >>>
>> >> >>> >> >>> On 16/08/11 23:35, Yan Chunlu wrote:
>> >> >>> >> >>>>
>> >> >>> >> >>>>  I saw alot slicequeryfilter things
if changed the log
>> level
>> >> >>> >> >>>> to
>> >> >>> >> >>>> DEBUG.
>> >> >>> >> >>>>  just
>> >> >>> >> >>>> thought even bring up a new node
will be faster than start
>> the
>> >> >>> >> >>>> old
>> >> >>> >> >>>> one..... it
>> >> >>> >> >>>> is wired
>> >> >>> >> >>>>
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,213
SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:225@1313068845474382
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,245
SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:453@1310999270198313
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,251
SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:26@1313199902088827
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,576
SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:157@1313097239332314
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,674
SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:41729@1313190821826229
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,811
SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:6@1313174157301203
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,867
SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:98@1312011362250907
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,881
SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:42@1313201711997005
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,910
SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:96@1312939986190155
>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,954
SliceQueryFilter.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 123)
>> >> >>> >> >>>> collecting 0 of 2147483647:
>> >> >>> >> >>>> 76616c7565:false:621@1313192538616112
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>> On Tue, Aug 16, 2011 at 7:32 PM,
Yan Chunlu
>> >> >>> >> >>>> <springrider@gmail.com
>> >> >>> >> >>>> <mailto:springrider@gmail.com>>
wrote:
>> >> >>> >> >>>>
>> >> >>> >> >>>>    but it seems the row cache is
cluster wide, how will
>>  the
>> >> >>> >> >>>> change
>> >> >>> >> >>>> of row
>> >> >>> >> >>>>    cache affect the read speed?
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>    On Mon, Aug 15, 2011 at 7:33 AM,
Jonathan Ellis
>> >> >>> >> >>>> <jbellis@gmail.com
>> >> >>> >> >>>>    <mailto:jbellis@gmail.com>>
wrote:
>> >> >>> >> >>>>
>> >> >>> >> >>>>        Or leave row cache enabled
but disable cache saving
>> >> >>> >> >>>> (and
>> >> >>> >> >>>> remove the
>> >> >>> >> >>>>        one already on disk).
>> >> >>> >> >>>>
>> >> >>> >> >>>>        On Sun, Aug 14, 2011 at 5:05
PM, aaron morton
>> >> >>> >> >>>> <aaron@thelastpickle.com
>> >> >>> >> >>>>        <mailto:aaron@thelastpickle.com>>
wrote:
>> >> >>> >> >>>>         >  INFO [main] 2011-08-14
09:24:52,198
>> >> >>> >> >>>> ColumnFamilyStore.java
>> >> >>> >> >>>> (line 547)
>> >> >>> >> >>>>         > completed loading (1744370
ms; 200000 keys) row
>> >> >>> >> >>>> cache
>> >> >>> >> >>>> for
>> >> >>> >> >>>> COMMENT
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > It's taking 29 minutes
to load 200,000 rows in
>> the
>> >> >>> >> >>>>  row
>> >> >>> >> >>>> cache.
>> >> >>> >> >>>> Thats a
>> >> >>> >> >>>>         > pretty big row cache,
I would suggest reducing or
>> >> >>> >> >>>> disabling
>> >> >>> >> >>>> it.
>> >> >>> >> >>>>         > Background
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > and server can not afford
the load then crashed.
>> >> >>> >> >>>> after
>> >> >>> >> >>>> come
>> >> >>> >> >>>> back,
>> >> >>> >> >>>>        node 3 can
>> >> >>> >> >>>>         > not return for more
than 96 hours
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > Crashed how ?
>> >> >>> >> >>>>         > You may be seeing
>> >> >>> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280
>> >> >>> >> >>>>         > Watch nodetool compactionstats
to see when the
>> >> >>> >> >>>> Merkle
>> >> >>> >> >>>> tree
>> >> >>> >> >>>> build
>> >> >>> >> >>>>        finishes
>> >> >>> >> >>>>         > and nodetool netstats
to see which CF's are
>> >> >>> >> >>>> streaming.
>> >> >>> >> >>>>         > Cheers
>> >> >>> >> >>>>         > -----------------
>> >> >>> >> >>>>         > Aaron Morton
>> >> >>> >> >>>>         > Freelance Cassandra
Developer
>> >> >>> >> >>>>         > @aaronmorton
>> >> >>> >> >>>>         > http://www.thelastpickle.com
>> >> >>> >> >>>>         > On 15 Aug 2011, at 04:23,
Yan Chunlu wrote:
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > I got 3 nodes and RF=3,
when I repairing ndoe3,
>> it
>> >> >>> >> >>>> seems
>> >> >>> >> >>>> alot
>> >> >>> >> >>>> data
>> >> >>> >> >>>>         > generated.  and server
can not afford the load
>> then
>> >> >>> >> >>>> crashed.
>> >> >>> >> >>>>         > after come back, node
3 can not return for more
>> than
>> >> >>> >> >>>> 96
>> >> >>> >> >>>> hours
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > for 34GB data, the node
2 could restart and back
>> >> >>> >> >>>> online
>> >> >>> >> >>>> within 1
>> >> >>> >> >>>> hour.
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > I am not sure what's
wrong with node3 and should
>> I
>> >> >>> >> >>>> restart
>> >> >>> >> >>>> node
>> >> >>> >> >>>> 3 again?
>> >> >>> >> >>>>         > thanks!
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > Address         Status
State   Load
>>  Owns
>> >> >>> >> >>>>  Token
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > 113427455640312821154458202477256070484
>> >> >>> >> >>>>         > node1     Up     Normal
 34.11 GB        33.33%
>>  0
>> >> >>> >> >>>>         > node2     Up     Normal
 31.44 GB        33.33%
>> >> >>> >> >>>>         > 56713727820156410577229101238628035242
>> >> >>> >> >>>>         > node3     Down   Normal
 177.55 GB       33.33%
>> >> >>> >> >>>>         > 113427455640312821154458202477256070484
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         > the log shows it is
still going on, not sure why
>> it
>> >> >>> >> >>>> is
>> >> >>> >> >>>> so
>> >> >>> >> >>>> slow:
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >  INFO [main] 2011-08-14
08:55:47,734
>> >> >>> >> >>>> SSTableReader.java
>> >> >>> >> >>>> (line
>> >> >>> >> >>>> 154)
>> >> >>> >> >>>>        Opening
>> >> >>> >> >>>>         > /cassandra/data/COMMENT
>> >> >>> >> >>>>         >  INFO [main] 2011-08-14
08:55:47,828
>> >> >>> >> >>>> ColumnFamilyStore.java
>> >> >>> >> >>>> (line 275)
>> >> >>> >> >>>>         > reading saved cache
>> >> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
>> >> >>> >> >>>>         >  INFO [main] 2011-08-14
09:24:52,198
>> >> >>> >> >>>> ColumnFamilyStore.java
>> >> >>> >> >>>> (line 547)
>> >> >>> >> >>>>         > completed loading (1744370
ms; 200000 keys) row
>> >> >>> >> >>>> cache
>> >> >>> >> >>>> for
>> >> >>> >> >>>> COMMENT
>> >> >>> >> >>>>         >  INFO [main] 2011-08-14
09:24:52,299
>> >> >>> >> >>>> ColumnFamilyStore.java
>> >> >>> >> >>>> (line 275)
>> >> >>> >> >>>>         > reading saved cache
>> >> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
>> >> >>> >> >>>>         >  INFO [CompactionExecutor:1]
2011-08-14
>> 10:24:55,480
>> >> >>> >> >>>>        CacheWriter.java (line
>> >> >>> >> >>>>         > 96) Saved COMMENT-RowCache
(200000 items) in 2535
>> ms
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>         >
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>        --
>> >> >>> >> >>>>        Jonathan Ellis
>> >> >>> >> >>>>        Project Chair, Apache Cassandra
>> >> >>> >> >>>>        co-founder of DataStax, the
source for professional
>> >> >>> >> >>>> Cassandra
>> >> >>> >> >>>> support
>> >> >>> >> >>>>        http://www.datastax.com
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>
>> >> >>> >> >>>
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >
>> >> >>> >
>> >> >>>
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Jonathan Ellis
>> >> Project Chair, Apache Cassandra
>> >> co-founder of DataStax, the source for professional Cassandra support
>> >> http://www.datastax.com
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>
>
>

Mime
View raw message