Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B3B527E1D for ; Sun, 21 Aug 2011 21:43:26 +0000 (UTC) Received: (qmail 86915 invoked by uid 500); 21 Aug 2011 21:43:23 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 86865 invoked by uid 500); 21 Aug 2011 21:43:22 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 86857 invoked by uid 99); 21 Aug 2011 21:43:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Aug 2011 21:43:22 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a49.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Aug 2011 21:43:17 +0000 Received: from homiemail-a49.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a49.g.dreamhost.com (Postfix) with ESMTP id 81D4E5E0057 for ; Sun, 21 Aug 2011 14:42:55 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=XVheWyAuTp c9YFoSopgX7GAxqP2ADm4mfUKgRK4QDDB6ZJiwArhIqQtUMxwVJJP1W/pghqUoTd 8eNH5a/0ysbBFIk5epMZL0FE/n9IIGHp+4AZ58NiQNCLrLmOBMUD1aM9rMVD5RYz TNWE+Rl/jg+FR72osHZ1wnx5vs/jX+TbQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=AXsWAFxH3BvwJdaJ lacNmLhgXK4=; b=J8fvddegtXiQFBevUy/xlD/f5+BV0kq1XregkA1ZH5yUmgOx GFBxz3WRcwSTE84dLa9LuGm01QAkY5lqO/jyVmQ16y91CFfPRkLCjZM3JubfqTlh q5MaRocrSvQCGgP2fjUPZbdFAiUIQJAVQD74CEzMwjR194Xtc56N/55mKJk= Received: from 202-126-206-198.vectorcommunications.net.nz (unknown [202.126.206.198]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a49.g.dreamhost.com (Postfix) with ESMTPSA id 12DED5E0056 for ; Sun, 21 Aug 2011 14:42:51 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1244.3) Content-Type: multipart/alternative; boundary="Apple-Mail=_B26C2FFD-5CD5-4DB1-B91F-376864A02AB7" Subject: Re: node restart taking too long Date: Mon, 22 Aug 2011 09:42:48 +1200 In-Reply-To: To: user@cassandra.apache.org References: <3066FEE2-CE8D-4B1D-BEB9-75812BAFE9F7@thelastpickle.com> <4E4AE839.2010403@wetafx.co.nz> <5FD79CA7-C800-45D8-9CB9-70F40236C497@thelastpickle.com> <194F5791-EB22-4BB1-9360-C1E531E78491@thelastpickle.com> Message-Id: <2A4566C8-AAFD-4656-BCA0-898FC23C3DAB@thelastpickle.com> X-Mailer: Apple Mail (2.1244.3) --Apple-Mail=_B26C2FFD-5CD5-4DB1-B91F-376864A02AB7 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 cf already exists is not the same.=20 Would need the call stack.=20 Cheers ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22/08/2011, at 1:03 AM, Yan Chunlu wrote: > is that means I could just wait and it will be okay eventually? >=20 > I also saw the "column family already exists"(not accurate, something = like that) Exception, also caused after I delete the migration and = schema sstables. but I can not reproduce it, is that a similar = problem? >=20 > On Sun, Aug 21, 2011 at 7:57 PM, aaron morton = wrote: > I've seen "Couldn't find cfId=3D1000" in a mutation stage happen when = a node joins a cluster with existing data after having it's schema = cleared.=20 >=20 > The migrations received from another node are applied one CF at a = time, when each CF is added the node will open the existing data files = which can take a while. In the mean time it's joined on gossip and is = receiving mutations from other nodes that have all the CF's. One the = returning node gets through applying the migration the errors should = stop.=20 >=20 > Read is a similar story. >=20 > Cheers > =20 >=20 >=20 > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com >=20 > On 21/08/2011, at 8:58 PM, Yan Chunlu wrote: >=20 >> actually I didn't dropped any CF, maybe my understanding was totally = wrong, I just describe what I thought as belows:=20 >>=20 >> I thought by "deleted CFs" means the sstable that useless(since "node = repair" and could copy data to another node, the original sstable might = be deleted but not yet). when I deleted all migration and schema = sstables, it somehow "forgot" those files should be deleted, so it read = the file and "can not find cfId"... >>=20 >>=20 >> I got to this situation by the following steps: at first I did "node = repair" on node2 which failed in the middle(node3 down), and leave the = Load as 170GB while average is 30GB. >>=20 >> after I brought up node3, the node2 start up very slow, 4 days past = it stil starting. it seems loading row cache and key cache. so I = disabled those cache by set the value to 0 via cassandra-cli. during = this procedure, of course node2 was not reachable so it can not update = the schema. >>=20 >> after that node2 could be start very quickly, but the "describe = cluster" shows it was "UNREACHABLE", so I did as the FAQ says, delete = schema, migration sstables and restart node2.=20 >>=20 >> then the "Couldn't find cfId=3D1000'" error start showing up. >>=20 >>=20 >>=20 >>=20 >>=20 >> I have just moved those migration && schema sstables back and start = cassandra, it still shows "UNREACHABLE", after wait for couple of hours, = the "describe cluster" shows they are the same version now. >>=20 >>=20 >> even this problem solved, I am not sure HOW....... really curious = that why just remove "migration* and schema*" sstables could cause = "Couldn't find cfId=3D1000'" error. >>=20 >> On Sun, Aug 21, 2011 at 12:24 PM, Jonathan Ellis = wrote: >> I'm not sure what problem you're trying to solve. The exception you >> pasted should stop once your clients are no longer trying to use the >> dropped CF. >>=20 >> On Sat, Aug 20, 2011 at 10:09 PM, Yan Chunlu = wrote: >> > that could be the reason, I did nodetool repair(unfinished, data = size >> > increased 6 times bigger 30G vs 170G) and there should be some = unclean >> > sstables on that node. >> > however upgrade it a tough work for me right now. could the = nodetool scrub >> > help? or decommission the node and join it again? >> > >> > On Sun, Aug 21, 2011 at 5:56 AM, Jonathan Ellis = wrote: >> >> >> >> This means you should upgrade, because we've fixed bugs about = ignoring >> >> deleted CFs since 0.7.4. >> >> >> >> On Fri, Aug 19, 2011 at 9:26 AM, Yan Chunlu = wrote: >> >> > the log file shows as follows, not sure what does 'Couldn't find >> >> > cfId=3D1000' >> >> > means(google just returned useless results): >> >> > >> >> > INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java = (line 453) >> >> > Found >> >> > table data in data directories. Consider using JMX to call >> >> > = org.apache.cassandra.service.StorageService.loadSchemaFromYaml(). >> >> > INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line = 50) >> >> > Creating new commitlog segment >> >> > /cassandra/commitlog/CommitLog-1313670197705.log >> >> > INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155) = Replaying >> >> > /cassandra/commitlog/CommitLog-1313670030512.log >> >> > INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314) = Finished >> >> > reading /cassandra/commitlog/CommitLog-1313670030512.log >> >> > INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) = Log >> >> > replay >> >> > complete >> >> > INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line = 364) >> >> > Cassandra version: 0.7.4 >> >> > INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line = 365) >> >> > Thrift >> >> > API version: 19.4.0 >> >> > INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line = 378) >> >> > Loading >> >> > persisted ring state >> >> > INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line = 414) >> >> > Starting >> >> > up server gossip >> >> > INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java = (line 1048) >> >> > Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1 >> >> > operations) >> >> > INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java = (line 157) >> >> > Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations) >> >> > INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java = (line 164) >> >> > Completed flushing = /cassandra/data/system/LocationInfo-f-66-Data.db (80 >> >> > bytes) >> >> > INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823 >> >> > CompactionManager.java >> >> > (line 396) Compacting >> >> > >> >> > = [SSTableReader(path=3D'/cassandra/data/system/LocationInfo-f-63-Data.db'),= SSTableReader(path=3D'/cassandra/data/system/LocationInfo-f-64-Data.db'),S= STableReader(path=3D'/cassandra/data/system/LocationInfo-f-65-Data.db'),SS= TableReader(path=3D'/cassandra/data/system/LocationInfo-f-66-Data.db')] >> >> > INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line = 478) >> >> > Using >> >> > saved token 113427455640312821154458202477256070484 >> >> > INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java = (line 1048) >> >> > Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2 >> >> > operations) >> >> > INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java = (line 157) >> >> > Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations) >> >> > ERROR [MutationStage:28] 2011-08-18 07:23:18,246 >> >> > RowMutationVerbHandler.java >> >> > (line 86) Error in row mutation >> >> > org.apache.cassandra.db.UnserializableColumnFamilyException: = Couldn't >> >> > find >> >> > cfId=3D1000 >> >> > at >> >> > >> >> > = org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySer= ializer.java:117) >> >> > at >> >> > >> >> > = org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowM= utation.java:380) >> >> > at >> >> > >> >> > = org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandl= er.java:50) >> >> > at >> >> > >> >> > = org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:= 72) >> >> > at >> >> > >> >> > = java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:= 1110) >> >> > at >> >> > >> >> > = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :603) >> >> > at java.lang.Thread.run(Thread.java:636) >> >> > INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java = (line 623) >> >> > Node >> >> > /node1 has restarted, now UP again >> >> > ERROR [ReadStage:1] 2011-08-18 07:23:18,254 >> >> > DebuggableThreadPoolExecutor.java (line 103) Error in = ThreadPoolExecutor >> >> > java.lang.IllegalArgumentException: Unknown ColumnFamily = prjcache in >> >> > keyspace prjkeyspace >> >> > at >> >> > >> >> > = org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescr= iptor.java:966) >> >> > at >> >> > >> >> > = org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:38= 8) >> >> > at >> >> > = org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93) >> >> > at >> >> > >> >> > = org.apache.cassandra.db.SliceByNamesReadCommand.(SliceByNamesReadCom= mand.java:44) >> >> > at >> >> > >> >> > = org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(Slic= eByNamesReadCommand.java:110) >> >> > at >> >> > >> >> > = org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java= :122) >> >> > at >> >> > = org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67) >> >> > >> >> > >> >> > On Fri, Aug 19, 2011 at 5:44 AM, aaron morton = >> >> > wrote: >> >> >> >> >> >> Look in the logs to work find out why the migration did not get = to >> >> >> node2. >> >> >> Otherwise yes you can drop those files. >> >> >> Cheers >> >> >> ----------------- >> >> >> Aaron Morton >> >> >> Freelance Cassandra Developer >> >> >> @aaronmorton >> >> >> http://www.thelastpickle.com >> >> >> On 18/08/2011, at 11:25 PM, Yan Chunlu wrote: >> >> >> >> >> >> just found out that changes via cassandra-cli, the schema = change didn't >> >> >> reach node2. and node2 became unreachable.... >> >> >> I did as this >> >> >> = document:http://wiki.apache.org/cassandra/FAQ#schema_disagreement >> >> >> but after that I just got two schema versons: >> >> >> >> >> >> >> >> >> ddcada52-c96a-11e0-99af-3bd951658d61: [node1, node3] >> >> >> 2127b2ef-6998-11e0-b45b-3bd951658d61: [node2] >> >> >> >> >> >> is that enough delete Schema* && Migrations* sstables and = restart the >> >> >> node? >> >> >> >> >> >> >> >> >> On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu = >> >> >> wrote: >> >> >>> >> >> >>> thanks a lot for all the help! I have gone through the steps = and >> >> >>> successfully brought up the node2 :) >> >> >>> >> >> >>> On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen = >> >> >>> wrote: >> >> >>> > Because the file only preserve the "key" of records, not the = whole >> >> >>> > record. >> >> >>> > Records for those saved key will be loaded into cassandra = during the >> >> >>> > startup >> >> >>> > of cassandra. >> >> >>> > >> >> >>> > On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu = >> >> >>> > wrote: >> >> >>> >> >> >> >>> >> but the data size in the saved_cache are relatively small: >> >> >>> >> >> >> >>> >> will that cause the load problem? >> >> >>> >> >> >> >>> >> ls -lh /cassandra/saved_caches/ >> >> >>> >> total 32M >> >> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53 >> >> >>> >> cass-CommentSortsCache-KeyCache >> >> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29 >> >> >>> >> cass-CommentSortsCache-RowCache >> >> >>> >> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50 >> >> >>> >> cass-CommentVote-KeyCache >> >> >>> >> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53 >> >> >>> >> cass-device_images-KeyCache >> >> >>> >> -rw-r--r-- 1 cass cass 33K 2011-08-12 18:51 = cass-Hide-KeyCache >> >> >>> >> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 = cass-images-KeyCache >> >> >>> >> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53 >> >> >>> >> cass-LinksByUrl-KeyCache >> >> >>> >> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50 = cass-LinkVote-KeyCache >> >> >>> >> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 = cass-cache-KeyCache >> >> >>> >> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 = cass-cache-RowCache >> >> >>> >> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 = cass-Save-KeyCache >> >> >>> >> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50 >> >> >>> >> cass-SavesByAccount-KeyCache >> >> >>> >> -rw-r--r-- 1 cass cass 864 2011-08-12 19:49 >> >> >>> >> cass-VotesByDay-KeyCache >> >> >>> >> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49 >> >> >>> >> cass-VotesByLink-KeyCache >> >> >>> >> -rw-r--r-- 1 cass cass 28 2011-08-14 12:50 >> >> >>> >> system-HintsColumnFamily-KeyCache >> >> >>> >> -rw-r--r-- 1 cass cass 5 2011-08-14 12:50 >> >> >>> >> system-LocationInfo-KeyCache >> >> >>> >> -rw-r--r-- 1 cass cass 54 2011-08-13 13:30 >> >> >>> >> system-Migrations-KeyCache >> >> >>> >> -rw-r--r-- 1 cass cass 76 2011-08-13 13:30 = system-Schema-KeyCache >> >> >>> >> >> >> >>> >> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton >> >> >>> >> >> >> >>> >> wrote: >> >> >>> >> > If you have a node that cannot start up due to issues = loading the >> >> >>> >> > saved >> >> >>> >> > cache delete the files in the saved_cache directory = before >> >> >>> >> > starting >> >> >>> >> > it. >> >> >>> >> > >> >> >>> >> > The settings to save the row and key cache are per CF. = You can >> >> >>> >> > change >> >> >>> >> > them with an update column family statement via the CLI = when >> >> >>> >> > attached to any >> >> >>> >> > node. You may then want to check the saved_caches = directory and >> >> >>> >> > delete any >> >> >>> >> > files that are left (not sure if they are automatically = deleted). >> >> >>> >> > >> >> >>> >> > i would recommend: >> >> >>> >> > - stop node 2 >> >> >>> >> > - delete it's saved_cache >> >> >>> >> > - make the schema change via another node >> >> >>> >> > - startup node 2 >> >> >>> >> > >> >> >>> >> > Cheers >> >> >>> >> > >> >> >>> >> > ----------------- >> >> >>> >> > Aaron Morton >> >> >>> >> > Freelance Cassandra Developer >> >> >>> >> > @aaronmorton >> >> >>> >> > http://www.thelastpickle.com >> >> >>> >> > >> >> >>> >> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote: >> >> >>> >> > >> >> >>> >> >> does this need to be cluster wide? or I could just = modify the >> >> >>> >> >> caches >> >> >>> >> >> on one node? since I could not connect to the node = with >> >> >>> >> >> cassandra-cli, it says "connection refused" >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> [default@unknown] connect node2/9160; >> >> >>> >> >> Exception connecting to node2/9160. Reason: Connection = refused. >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> so if I change the cache size via other nodes, how could = node2 >> >> >>> >> >> be >> >> >>> >> >> notified the changing? kill cassandra and start it = again >> >> >>> >> >> could >> >> >>> >> >> make >> >> >>> >> >> it update the schema? >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer >> >> >>> >> >> >> >> >>> >> >> wrote: >> >> >>> >> >>> Hi, >> >> >>> >> >>> >> >> >>> >> >>> yes, we saw exactly the same messages. We got rid of = these by >> >> >>> >> >>> doing >> >> >>> >> >>> the >> >> >>> >> >>> following: >> >> >>> >> >>> >> >> >>> >> >>> * Set all row & key caches in your CFs to 0 via = cassandra-cli >> >> >>> >> >>> * Kill Cassandra >> >> >>> >> >>> * Remove all files in the saved_caches directory >> >> >>> >> >>> * Start Cassandra >> >> >>> >> >>> * Slowly bring back row & key caches (if desired, we = left them >> >> >>> >> >>> off) >> >> >>> >> >>> >> >> >>> >> >>> Cheers, >> >> >>> >> >>> >> >> >>> >> >>> T. >> >> >>> >> >>> >> >> >>> >> >>> On 16/08/11 23:35, Yan Chunlu wrote: >> >> >>> >> >>>> >> >> >>> >> >>>> I saw alot slicequeryfilter things if changed the log = level >> >> >>> >> >>>> to >> >> >>> >> >>>> DEBUG. >> >> >>> >> >>>> just >> >> >>> >> >>>> thought even bring up a new node will be faster than = start the >> >> >>> >> >>>> old >> >> >>> >> >>>> one..... it >> >> >>> >> >>>> is wired >> >> >>> >> >>>> >> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,213 = SliceQueryFilter.java >> >> >>> >> >>>> (line >> >> >>> >> >>>> 123) >> >> >>> >> >>>> collecting 0 of 2147483647: >> >> >>> >> >>>> 76616c7565:false:225@1313068845474382 >> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,245 = SliceQueryFilter.java >> >> >>> >> >>>> (line >> >> >>> >> >>>> 123) >> >> >>> >> >>>> collecting 0 of 2147483647: >> >> >>> >> >>>> 76616c7565:false:453@1310999270198313 >> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,251 = SliceQueryFilter.java >> >> >>> >> >>>> (line >> >> >>> >> >>>> 123) >> >> >>> >> >>>> collecting 0 of 2147483647: >> >> >>> >> >>>> 76616c7565:false:26@1313199902088827 >> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,576 = SliceQueryFilter.java >> >> >>> >> >>>> (line >> >> >>> >> >>>> 123) >> >> >>> >> >>>> collecting 0 of 2147483647: >> >> >>> >> >>>> 76616c7565:false:157@1313097239332314 >> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,674 = SliceQueryFilter.java >> >> >>> >> >>>> (line >> >> >>> >> >>>> 123) >> >> >>> >> >>>> collecting 0 of 2147483647: >> >> >>> >> >>>> 76616c7565:false:41729@1313190821826229 >> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,811 = SliceQueryFilter.java >> >> >>> >> >>>> (line >> >> >>> >> >>>> 123) >> >> >>> >> >>>> collecting 0 of 2147483647: >> >> >>> >> >>>> 76616c7565:false:6@1313174157301203 >> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,867 = SliceQueryFilter.java >> >> >>> >> >>>> (line >> >> >>> >> >>>> 123) >> >> >>> >> >>>> collecting 0 of 2147483647: >> >> >>> >> >>>> 76616c7565:false:98@1312011362250907 >> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,881 = SliceQueryFilter.java >> >> >>> >> >>>> (line >> >> >>> >> >>>> 123) >> >> >>> >> >>>> collecting 0 of 2147483647: >> >> >>> >> >>>> 76616c7565:false:42@1313201711997005 >> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,910 = SliceQueryFilter.java >> >> >>> >> >>>> (line >> >> >>> >> >>>> 123) >> >> >>> >> >>>> collecting 0 of 2147483647: >> >> >>> >> >>>> 76616c7565:false:96@1312939986190155 >> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,954 = SliceQueryFilter.java >> >> >>> >> >>>> (line >> >> >>> >> >>>> 123) >> >> >>> >> >>>> collecting 0 of 2147483647: >> >> >>> >> >>>> 76616c7565:false:621@1313192538616112 >> >> >>> >> >>>> >> >> >>> >> >>>> >> >> >>> >> >>>> >> >> >>> >> >>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu >> >> >>> >> >>>> > >> >>> >> >>>> > wrote: >> >> >>> >> >>>> >> >> >>> >> >>>> but it seems the row cache is cluster wide, how = will the >> >> >>> >> >>>> change >> >> >>> >> >>>> of row >> >> >>> >> >>>> cache affect the read speed? >> >> >>> >> >>>> >> >> >>> >> >>>> >> >> >>> >> >>>> On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis >> >> >>> >> >>>> > >> >>> >> >>>> > wrote: >> >> >>> >> >>>> >> >> >>> >> >>>> Or leave row cache enabled but disable cache = saving >> >> >>> >> >>>> (and >> >> >>> >> >>>> remove the >> >> >>> >> >>>> one already on disk). >> >> >>> >> >>>> >> >> >>> >> >>>> On Sun, Aug 14, 2011 at 5:05 PM, aaron morton >> >> >>> >> >>>> > >> >>> >> >>>> > wrote: >> >> >>> >> >>>> > INFO [main] 2011-08-14 09:24:52,198 >> >> >>> >> >>>> ColumnFamilyStore.java >> >> >>> >> >>>> (line 547) >> >> >>> >> >>>> > completed loading (1744370 ms; 200000 keys) = row >> >> >>> >> >>>> cache >> >> >>> >> >>>> for >> >> >>> >> >>>> COMMENT >> >> >>> >> >>>> > >> >> >>> >> >>>> > It's taking 29 minutes to load 200,000 rows = in the >> >> >>> >> >>>> row >> >> >>> >> >>>> cache. >> >> >>> >> >>>> Thats a >> >> >>> >> >>>> > pretty big row cache, I would suggest = reducing or >> >> >>> >> >>>> disabling >> >> >>> >> >>>> it. >> >> >>> >> >>>> > Background >> >> >>> >> >>>> >> >> >>> >> >>>> >> >> >>> >> >>>> >> >> >>> >> >>>> >> >> >>> >> >>>> = http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra >> >> >>> >> >>>> > >> >> >>> >> >>>> > and server can not afford the load then = crashed. >> >> >>> >> >>>> after >> >> >>> >> >>>> come >> >> >>> >> >>>> back, >> >> >>> >> >>>> node 3 can >> >> >>> >> >>>> > not return for more than 96 hours >> >> >>> >> >>>> > >> >> >>> >> >>>> > Crashed how ? >> >> >>> >> >>>> > You may be seeing >> >> >>> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280 >> >> >>> >> >>>> > Watch nodetool compactionstats to see when = the >> >> >>> >> >>>> Merkle >> >> >>> >> >>>> tree >> >> >>> >> >>>> build >> >> >>> >> >>>> finishes >> >> >>> >> >>>> > and nodetool netstats to see which CF's are >> >> >>> >> >>>> streaming. >> >> >>> >> >>>> > Cheers >> >> >>> >> >>>> > ----------------- >> >> >>> >> >>>> > Aaron Morton >> >> >>> >> >>>> > Freelance Cassandra Developer >> >> >>> >> >>>> > @aaronmorton >> >> >>> >> >>>> > http://www.thelastpickle.com >> >> >>> >> >>>> > On 15 Aug 2011, at 04:23, Yan Chunlu wrote: >> >> >>> >> >>>> > >> >> >>> >> >>>> > >> >> >>> >> >>>> > I got 3 nodes and RF=3D3, when I repairing = ndoe3, it >> >> >>> >> >>>> seems >> >> >>> >> >>>> alot >> >> >>> >> >>>> data >> >> >>> >> >>>> > generated. and server can not afford the = load then >> >> >>> >> >>>> crashed. >> >> >>> >> >>>> > after come back, node 3 can not return for = more than >> >> >>> >> >>>> 96 >> >> >>> >> >>>> hours >> >> >>> >> >>>> > >> >> >>> >> >>>> > for 34GB data, the node 2 could restart and = back >> >> >>> >> >>>> online >> >> >>> >> >>>> within 1 >> >> >>> >> >>>> hour. >> >> >>> >> >>>> > >> >> >>> >> >>>> > I am not sure what's wrong with node3 and = should I >> >> >>> >> >>>> restart >> >> >>> >> >>>> node >> >> >>> >> >>>> 3 again? >> >> >>> >> >>>> > thanks! >> >> >>> >> >>>> > >> >> >>> >> >>>> > Address Status State Load = Owns >> >> >>> >> >>>> Token >> >> >>> >> >>>> > >> >> >>> >> >>>> > 113427455640312821154458202477256070484 >> >> >>> >> >>>> > node1 Up Normal 34.11 GB = 33.33% 0 >> >> >>> >> >>>> > node2 Up Normal 31.44 GB = 33.33% >> >> >>> >> >>>> > 56713727820156410577229101238628035242 >> >> >>> >> >>>> > node3 Down Normal 177.55 GB = 33.33% >> >> >>> >> >>>> > 113427455640312821154458202477256070484 >> >> >>> >> >>>> > >> >> >>> >> >>>> > >> >> >>> >> >>>> > the log shows it is still going on, not sure = why it >> >> >>> >> >>>> is >> >> >>> >> >>>> so >> >> >>> >> >>>> slow: >> >> >>> >> >>>> > >> >> >>> >> >>>> > >> >> >>> >> >>>> > INFO [main] 2011-08-14 08:55:47,734 >> >> >>> >> >>>> SSTableReader.java >> >> >>> >> >>>> (line >> >> >>> >> >>>> 154) >> >> >>> >> >>>> Opening >> >> >>> >> >>>> > /cassandra/data/COMMENT >> >> >>> >> >>>> > INFO [main] 2011-08-14 08:55:47,828 >> >> >>> >> >>>> ColumnFamilyStore.java >> >> >>> >> >>>> (line 275) >> >> >>> >> >>>> > reading saved cache >> >> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache >> >> >>> >> >>>> > INFO [main] 2011-08-14 09:24:52,198 >> >> >>> >> >>>> ColumnFamilyStore.java >> >> >>> >> >>>> (line 547) >> >> >>> >> >>>> > completed loading (1744370 ms; 200000 keys) = row >> >> >>> >> >>>> cache >> >> >>> >> >>>> for >> >> >>> >> >>>> COMMENT >> >> >>> >> >>>> > INFO [main] 2011-08-14 09:24:52,299 >> >> >>> >> >>>> ColumnFamilyStore.java >> >> >>> >> >>>> (line 275) >> >> >>> >> >>>> > reading saved cache >> >> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache >> >> >>> >> >>>> > INFO [CompactionExecutor:1] 2011-08-14 = 10:24:55,480 >> >> >>> >> >>>> CacheWriter.java (line >> >> >>> >> >>>> > 96) Saved COMMENT-RowCache (200000 items) in = 2535 ms >> >> >>> >> >>>> > >> >> >>> >> >>>> > >> >> >>> >> >>>> > >> >> >>> >> >>>> > >> >> >>> >> >>>> > >> >> >>> >> >>>> > >> >> >>> >> >>>> >> >> >>> >> >>>> >> >> >>> >> >>>> >> >> >>> >> >>>> -- >> >> >>> >> >>>> Jonathan Ellis >> >> >>> >> >>>> Project Chair, Apache Cassandra >> >> >>> >> >>>> co-founder of DataStax, the source for = professional >> >> >>> >> >>>> Cassandra >> >> >>> >> >>>> support >> >> >>> >> >>>> http://www.datastax.com >> >> >>> >> >>>> >> >> >>> >> >>>> >> >> >>> >> >>>> >> >> >>> >> >>> >> >> >>> >> >>> >> >> >>> >> > >> >> >>> >> > >> >> >>> > >> >> >>> > >> >> >>> >> >> >> >> >> >> >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> Jonathan Ellis >> >> Project Chair, Apache Cassandra >> >> co-founder of DataStax, the source for professional Cassandra = support >> >> http://www.datastax.com >> > >> > >>=20 >>=20 >>=20 >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >>=20 >=20 >=20 --Apple-Mail=_B26C2FFD-5CD5-4DB1-B91F-376864A02AB7 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 cf = already exists is not the same. 

Would need the = call stack. 

Cheers

http://www.thelastpickle.com

On 22/08/2011, at 1:03 AM, Yan Chunlu wrote:

is that = means I could just wait and it will be okay = eventually?

I also saw the "column family already = exists"(not accurate, something like that) Exception, also caused after = I delete the migration and schema sstables.   but I can not = reproduce it, is that a similar problem?

On Sun, Aug 21, 2011 at 7:57 PM, aaron = morton <aaron@thelastpickle.com> wrote:
I've seen "Couldn't find cfId=3D1000" = in a mutation stage happen when a node joins a cluster with existing = data after having it's schema cleared. 

The migrations received from another node are applied one CF at a time, = when each CF is added the node will open the existing data files which = can take a while. In the mean time it's joined on gossip and is = receiving mutations from other nodes that have all the CF's. One the = returning node gets through applying the migration the errors should = stop. 

Read is a similar = story.

Cheers
 


-----------------
Aaron Morton
Freelance = Cassandra Developer
@aaronmorton

On 21/08/2011, at = 8:58 PM, Yan Chunlu wrote:

actually I didn't dropped any CF,  maybe my = understanding was totally wrong, I just describe what I thought as = belows: 

I thought by "deleted = CFs" means the sstable that useless(since "node repair" and could copy = data to another node,  the original sstable might be deleted but = not yet).  when I deleted all migration and schema sstables, it = somehow "forgot" those files should be deleted, so it read the file and = "can not find cfId"...


I got to this situation by the = following steps: at first I did "node repair" on node2 which failed in = the middle(node3 down), and leave the Load as 170GB while average is = 30GB.

after I brought up node3,  the node2 start up very = slow, 4 days past it stil starting.  it seems loading row cache and = key cache.  so I disabled those cache by set the value to 0 via = cassandra-cli. during this procedure, of course node2 was not reachable = so it can not update the schema.

after that node2 could be start very quickly, but = the "describe cluster" shows it was "UNREACHABLE", so I did as the FAQ = says, delete schema, migration sstables and restart node2. 

then the "Couldn't = find cfId=3D1000'" error start showing up.





<= /div>
I have just moved those migration && schema sstables = back and start cassandra, it still shows "UNREACHABLE", after wait for = couple of hours, the "describe cluster" shows they are the same version = now.


even this problem solved, I am not = sure HOW....... really curious that why just remove "migration* and = schema*" sstables could cause  "Couldn't = find cfId=3D1000'"  error.

On Sun, Aug 21, 2011 at 12:24 PM, = Jonathan Ellis <jbellis@gmail.com> wrote:
I'm not sure what problem you're trying to solve.  The exception = you
pasted should stop once your clients are no longer trying to use the
dropped CF.

On Sat, Aug 20, 2011 at 10:09 PM, Yan Chunlu <springrider@gmail.com> wrote:
> that could be the reason, I did nodetool repair(unfinished, = data size
> increased 6 times bigger 30G vs 170G) and there should be some = unclean
> sstables on that node.
> however upgrade it a tough work for me right now.  could the = nodetool scrub
> help?  or decommission the node and join it again?
>
> On Sun, Aug 21, 2011 at 5:56 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>
>> This means you should upgrade, because we've fixed bugs about = ignoring
>> deleted CFs since 0.7.4.
>>
>> On Fri, Aug 19, 2011 at 9:26 AM, Yan Chunlu <springrider@gmail.com> wrote:
>> > the log file shows as follows, not sure what does = 'Couldn't find
>> > cfId=3D1000'
>> > means(google just returned useless results):
>> >
>> > INFO [main] 2011-08-18 07:23:17,688 = DatabaseDescriptor.java (line 453)
>> > Found
>> > table data in data directories. Consider using JMX to = call
>> > = org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
>> >  INFO [main] 2011-08-18 07:23:17,705 = CommitLogSegment.java (line 50)
>> > Creating new commitlog segment
>> > /cassandra/commitlog/CommitLog-1313670197705.log
>> >  INFO [main] 2011-08-18 07:23:17,716 CommitLog.java = (line 155) Replaying
>> > /cassandra/commitlog/CommitLog-1313670030512.log
>> >  INFO [main] 2011-08-18 07:23:17,734 CommitLog.java = (line 314) Finished
>> > reading = /cassandra/commitlog/CommitLog-1313670030512.log
>> >  INFO [main] 2011-08-18 07:23:17,744 CommitLog.java = (line 163) Log
>> > replay
>> > complete
>> >  INFO [main] 2011-08-18 07:23:17,756 = StorageService.java (line 364)
>> > Cassandra version: 0.7.4
>> >  INFO [main] 2011-08-18 07:23:17,756 = StorageService.java (line 365)
>> > Thrift
>> > API version: 19.4.0
>> >  INFO [main] 2011-08-18 07:23:17,756 = StorageService.java (line 378)
>> > Loading
>> > persisted ring state
>> >  INFO [main] 2011-08-18 07:23:17,766 = StorageService.java (line 414)
>> > Starting
>> > up server gossip
>> >  INFO [main] 2011-08-18 07:23:17,771 = ColumnFamilyStore.java (line 1048)
>> > Enqueuing flush of Memtable-LocationInfo@832310230(29 = bytes, 1
>> > operations)
>> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,772 = Memtable.java (line 157)
>> > Writing Memtable-LocationInfo@832310230(29 bytes, 1 = operations)
>> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,822 = Memtable.java (line 164)
>> > Completed flushing = /cassandra/data/system/LocationInfo-f-66-Data.db (80
>> > bytes)
>> >  INFO [CompactionExecutor:1] 2011-08-18 = 07:23:17,823
>> > CompactionManager.java
>> > (line 396) Compacting
>> >
>> > = [SSTableReader(path=3D'/cassandra/data/system/LocationInfo-f-63-Data.db'),= SSTableReader(path=3D'/cassandra/data/system/LocationInfo-f-64-Data.db'),S= STableReader(path=3D'/cassandra/data/system/LocationInfo-f-65-Data.db'),SS= TableReader(path=3D'/cassandra/data/system/LocationInfo-f-66-Data.db')] >> >  INFO [main] 2011-08-18 07:23:17,853 = StorageService.java (line 478)
>> > Using
>> > saved token 113427455640312821154458202477256070484
>> >  INFO [main] 2011-08-18 07:23:17,854 = ColumnFamilyStore.java (line 1048)
>> > Enqueuing flush of Memtable-LocationInfo@18895884(53 = bytes, 2
>> > operations)
>> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,854 = Memtable.java (line 157)
>> > Writing Memtable-LocationInfo@18895884(53 bytes, 2 = operations)
>> > ERROR [MutationStage:28] 2011-08-18 07:23:18,246
>> > RowMutationVerbHandler.java
>> > (line 86) Error in row mutation
>> > = org.apache.cassandra.db.UnserializableColumnFamilyException: = Couldn't
>> > find
>> > cfId=3D1000
>> >     at
>> >
>> > = org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySer= ializer.java:117)
>> >     at
>> >
>> > = org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowM= utation.java:380)
>> >     at
>> >
>> > = org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandl= er.java:50)
>> >     at
>> >
>> > = org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:= 72)
>> >     at
>> >
>> > = java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:= 1110)
>> >     at
>> >
>> > = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :603)
>> >     at java.lang.Thread.run(Thread.java:636)
>> >  INFO [GossipStage:1] 2011-08-18 07:23:18,255 = Gossiper.java (line 623)
>> > Node
>> > /node1 has restarted, now UP again
>> > ERROR [ReadStage:1] 2011-08-18 07:23:18,254
>> > DebuggableThreadPoolExecutor.java (line 103) Error in = ThreadPoolExecutor
>> > java.lang.IllegalArgumentException: Unknown ColumnFamily = prjcache in
>> > keyspace prjkeyspace
>> >     at
>> >
>> > = org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescr= iptor.java:966)
>> >     at
>> >
>> > = org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:38= 8)
>> >     at
>> > = org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93)
= >> >     at
>> >
>> > = org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesR= eadCommand.java:44)
>> >     at
>> >
>> > = org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(Slic= eByNamesReadCommand.java:110)
>> >     at
>> >
>> > = org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java= :122)
>> >     at
>> > = org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67) >> >
>> >
>> > On Fri, Aug 19, 2011 at 5:44 AM, aaron morton <aaron@thelastpickle.com>
>> > wrote:
>> >>
>> >> Look in the logs to work find out why the migration = did not get to
>> >> node2.
>> >> Otherwise yes you can drop those files.
>> >> Cheers
>> >> -----------------
>> >> Aaron Morton
>> >> Freelance Cassandra Developer
>> >> @aaronmorton
>> >> http://www.thelastpickle.com
>> >> On 18/08/2011, at 11:25 PM, Yan Chunlu wrote:
>> >>
>> >> just found out that changes via cassandra-cli, the = schema change didn't
>> >> reach node2. and node2 became unreachable....
>> >> I did as this
>> >> document:http://wiki.apache.org/cassandra/FAQ#schema_disagreement=
>> >> but after that I just got two schema versons:
>> >>
>> >>
>> >> ddcada52-c96a-11e0-99af-3bd951658d61: [node1, = node3]
>> >> 2127b2ef-6998-11e0-b45b-3bd951658d61: [node2]
>> >>
>> >> is that enough delete Schema* && Migrations* = sstables and restart the
>> >> node?
>> >>
>> >>
>> >> On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu <springrider@gmail.com>
>> >> wrote:
>> >>>
>> >>> thanks a lot for  all the help!  I have = gone through the steps and
>> >>> successfully brought up the node2 :)
>> >>>
>> >>> On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen <yulinyen@gmail.com>
>> >>> wrote:
>> >>> > Because the file only preserve the "key" of = records, not the whole
>> >>> > record.
>> >>> > Records for those saved key will be loaded = into cassandra during the
>> >>> > startup
>> >>> > of cassandra.
>> >>> >
>> >>> > On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu = <springrider@gmail.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> but the data size in the saved_cache are = relatively small:
>> >>> >>
>> >>> >> will that cause the load problem?
>> >>> >>
>> >>> >>  ls  -lh =  /cassandra/saved_caches/
>> >>> >> total 32M
>> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 = 19:53
>> >>> >> cass-CommentSortsCache-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 = 04:29
>> >>> >> cass-CommentSortsCache-RowCache
>> >>> >> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 = 18:50
>> >>> >> cass-CommentVote-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass 140K 2011-08-12 = 19:53
>> >>> >> cass-device_images-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass  33K = 2011-08-12 18:51 cass-Hide-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 = 19:53 cass-images-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 = 19:53
>> >>> >> cass-LinksByUrl-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 = 18:50 cass-LinkVote-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 = 18:50 cass-cache-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 = 21:51 cass-cache-RowCache
>> >>> >> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 = 18:51 cass-Save-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass 111K 2011-08-12 = 19:50
>> >>> >> cass-SavesByAccount-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass  864 = 2011-08-12 19:49
>> >>> >> cass-VotesByDay-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass 249K 2011-08-12 = 19:49
>> >>> >> cass-VotesByLink-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass   28 = 2011-08-14 12:50
>> >>> >> system-HintsColumnFamily-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass    5 = 2011-08-14 12:50
>> >>> >> system-LocationInfo-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass   54 = 2011-08-13 13:30
>> >>> >> system-Migrations-KeyCache
>> >>> >> -rw-r--r-- 1 cass cass   76 = 2011-08-13 13:30 system-Schema-KeyCache
>> >>> >>
>> >>> >> On Wed, Aug 17, 2011 at 4:31 PM, aaron = morton
>> >>> >> <aaron@thelastpickle.com>
>> >>> >> wrote:
>> >>> >> > If you have a node that cannot start = up due to issues loading the
>> >>> >> > saved
>> >>> >> > cache delete the files in the = saved_cache directory before
>> >>> >> > starting
>> >>> >> > it.
>> >>> >> >
>> >>> >> > The settings to save the row and key = cache are per CF. You can
>> >>> >> > change
>> >>> >> > them with an update column family = statement via the CLI when
>> >>> >> > attached to any
>> >>> >> > node. You may then want to check the = saved_caches directory and
>> >>> >> > delete any
>> >>> >> > files that are left (not sure if = they are automatically deleted).
>> >>> >> >
>> >>> >> > i would recommend:
>> >>> >> > - stop node 2
>> >>> >> > - delete it's saved_cache
>> >>> >> > - make the schema change via another = node
>> >>> >> > - startup node 2
>> >>> >> >
>> >>> >> > Cheers
>> >>> >> >
>> >>> >> > -----------------
>> >>> >> > Aaron Morton
>> >>> >> > Freelance Cassandra Developer
>> >>> >> > @aaronmorton
>> >>> >> > http://www.thelastpickle.com
>> >>> >> >
>> >>> >> > On 17/08/2011, at 2:59 PM, Yan = Chunlu wrote:
>> >>> >> >
>> >>> >> >> does this need to be cluster = wide? or I could just modify the
>> >>> >> >> caches
>> >>> >> >> on one node?   since I = could not connect to the node with
>> >>> >> >> cassandra-cli, it says = "connection refused"
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> [default@unknown] connect = node2/9160;
>> >>> >> >> Exception connecting to = node2/9160. Reason: Connection refused.
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> so if I change the cache size = via other nodes, how could node2
>> >>> >> >> be
>> >>> >> >> notified the changing?   =  kill cassandra and start it again
>> >>> >> >> could
>> >>> >> >> make
>> >>> >> >> it update the schema?
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> On Wed, Aug 17, 2011 at 5:59 AM, = Teijo Holzer
>> >>> >> >> <tholzer@wetafx.co.nz>
>> >>> >> >> wrote:
>> >>> >> >>> Hi,
>> >>> >> >>>
>> >>> >> >>> yes, we saw exactly the same = messages. We got rid of these by
>> >>> >> >>> doing
>> >>> >> >>> the
>> >>> >> >>> following:
>> >>> >> >>>
>> >>> >> >>> * Set all row & key = caches in your CFs to 0 via cassandra-cli
>> >>> >> >>> * Kill Cassandra
>> >>> >> >>> * Remove all files in the = saved_caches directory
>> >>> >> >>> * Start Cassandra
>> >>> >> >>> * Slowly bring back row = & key caches (if desired, we left them
>> >>> >> >>> off)
>> >>> >> >>>
>> >>> >> >>> Cheers,
>> >>> >> >>>
>> >>> >> >>>       =  T.
>> >>> >> >>>
>> >>> >> >>> On 16/08/11 23:35, Yan = Chunlu wrote:
>> >>> >> >>>>
>> >>> >> >>>>  I saw alot = slicequeryfilter things if changed the log level
>> >>> >> >>>> to
>> >>> >> >>>> DEBUG.
>> >>> >> >>>>  just
>> >>> >> >>>> thought even bring up a = new node will be faster than start the
>> >>> >> >>>> old
>> >>> >> >>>> one..... it
>> >>> >> >>>> is wired
>> >>> >> >>>>
>> >>> >> >>>> DEBUG [main] 2011-08-16 = 06:32:49,213 SliceQueryFilter.java
>> >>> >> >>>> (line
>> >>> >> >>>> 123)
>> >>> >> >>>> collecting 0 of 2147483647:
>> >>> >> >>>> = 76616c7565:false:225@1313068845474382
>> >>> >> >>>> DEBUG [main] 2011-08-16 = 06:32:49,245 SliceQueryFilter.java
>> >>> >> >>>> (line
>> >>> >> >>>> 123)
>> >>> >> >>>> collecting 0 of 2147483647:
>> >>> >> >>>> = 76616c7565:false:453@1310999270198313
>> >>> >> >>>> DEBUG [main] 2011-08-16 = 06:32:49,251 SliceQueryFilter.java
>> >>> >> >>>> (line
>> >>> >> >>>> 123)
>> >>> >> >>>> collecting 0 of 2147483647:
>> >>> >> >>>> = 76616c7565:false:26@1313199902088827
>> >>> >> >>>> DEBUG [main] 2011-08-16 = 06:32:49,576 SliceQueryFilter.java
>> >>> >> >>>> (line
>> >>> >> >>>> 123)
>> >>> >> >>>> collecting 0 of 2147483647:
>> >>> >> >>>> = 76616c7565:false:157@1313097239332314
>> >>> >> >>>> DEBUG [main] 2011-08-16 = 06:32:50,674 SliceQueryFilter.java
>> >>> >> >>>> (line
>> >>> >> >>>> 123)
>> >>> >> >>>> collecting 0 of 2147483647:
>> >>> >> >>>> = 76616c7565:false:41729@1313190821826229
>> >>> >> >>>> DEBUG [main] 2011-08-16 = 06:32:50,811 SliceQueryFilter.java
>> >>> >> >>>> (line
>> >>> >> >>>> 123)
>> >>> >> >>>> collecting 0 of 2147483647:
>> >>> >> >>>> = 76616c7565:false:6@1313174157301203
>> >>> >> >>>> DEBUG [main] 2011-08-16 = 06:32:50,867 SliceQueryFilter.java
>> >>> >> >>>> (line
>> >>> >> >>>> 123)
>> >>> >> >>>> collecting 0 of 2147483647:
>> >>> >> >>>> = 76616c7565:false:98@1312011362250907
>> >>> >> >>>> DEBUG [main] 2011-08-16 = 06:32:50,881 SliceQueryFilter.java
>> >>> >> >>>> (line
>> >>> >> >>>> 123)
>> >>> >> >>>> collecting 0 of 2147483647:
>> >>> >> >>>> = 76616c7565:false:42@1313201711997005
>> >>> >> >>>> DEBUG [main] 2011-08-16 = 06:32:50,910 SliceQueryFilter.java
>> >>> >> >>>> (line
>> >>> >> >>>> 123)
>> >>> >> >>>> collecting 0 of 2147483647:
>> >>> >> >>>> = 76616c7565:false:96@1312939986190155
>> >>> >> >>>> DEBUG [main] 2011-08-16 = 06:32:50,954 SliceQueryFilter.java
>> >>> >> >>>> (line
>> >>> >> >>>> 123)
>> >>> >> >>>> collecting 0 of 2147483647:
>> >>> >> >>>> = 76616c7565:false:621@1313192538616112
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>> On Tue, Aug 16, 2011 at = 7:32 PM, Yan Chunlu
>> >>> >> >>>> <springrider@gmail.com
>> >>> >> >>>> <mailto:springrider@gmail.com>> wrote:
>> >>> >> >>>>
>> >>> >> >>>>    but it = seems the row cache is cluster wide, how will  the
>> >>> >> >>>> change
>> >>> >> >>>> of row
>> >>> >> >>>>    cache = affect the read speed?
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>>    On Mon, Aug = 15, 2011 at 7:33 AM, Jonathan Ellis
>> >>> >> >>>> <jbellis@gmail.com
>> >>> >> >>>>   =  <mailto:jbellis@gmail.com>> wrote:
>> >>> >> >>>>
>> >>> >> >>>>       =  Or leave row cache enabled but disable cache saving
>> >>> >> >>>> (and
>> >>> >> >>>> remove the
>> >>> >> >>>>       =  one already on disk).
>> >>> >> >>>>
>> >>> >> >>>>       =  On Sun, Aug 14, 2011 at 5:05 PM, aaron morton
>> >>> >> >>>> <aaron@thelastpickle.com
>> >>> >> >>>>       =  <mailto:aaron@thelastpickle.com>> wrote:
>> >>> >> >>>>       =   >  INFO [main] 2011-08-14 09:24:52,198
>> >>> >> >>>> = ColumnFamilyStore.java
>> >>> >> >>>> (line 547)
>> >>> >> >>>>       =   > completed loading (1744370 ms; 200000 keys) row
>> >>> >> >>>> cache
>> >>> >> >>>> for
>> >>> >> >>>> COMMENT
>> >>> >> >>>>       =   >
>> >>> >> >>>>       =   > It's taking 29 minutes to load 200,000 rows in the
>> >>> >> >>>>  row
>> >>> >> >>>> cache.
>> >>> >> >>>> Thats a
>> >>> >> >>>>       =   > pretty big row cache, I would suggest reducing or
>> >>> >> >>>> disabling
>> >>> >> >>>> it.
>> >>> >> >>>>       =   > Background
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>>  http://www.datastax.com/dev/blog/maximizing-cache-benefi= t-with-cassandra
>> >>> >> >>>>       =   >
>> >>> >> >>>>       =   > and server can not afford the load then crashed.
>> >>> >> >>>> after
>> >>> >> >>>> come
>> >>> >> >>>> back,
>> >>> >> >>>>       =  node 3 can
>> >>> >> >>>>       =   > not return for more than 96 hours
>> >>> >> >>>>       =   >
>> >>> >> >>>>       =   > Crashed how ?
>> >>> >> >>>>       =   > You may be seeing
>> >>> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280=
>> >>> >> >>>>       =   > Watch nodetool compactionstats to see when the
>> >>> >> >>>> Merkle
>> >>> >> >>>> tree
>> >>> >> >>>> build
>> >>> >> >>>>       =  finishes
>> >>> >> >>>>       =   > and nodetool netstats to see which CF's are
>> >>> >> >>>> streaming.
>> >>> >> >>>>       =   > Cheers
>> >>> >> >>>>       =   > -----------------
>> >>> >> >>>>       =   > Aaron Morton
>> >>> >> >>>>       =   > Freelance Cassandra Developer
>> >>> >> >>>>       =   > @aaronmorton
>> >>> >> >>>>       =   > http://www.thelastpickle.com
>> >>> >> >>>>       =   > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
>> >>> >> >>>>       =   >
>> >>> >> >>>>       =   >
>> >>> >> >>>>       =   > I got 3 nodes and RF=3D3, when I repairing ndoe3, it
>> >>> >> >>>> seems
>> >>> >> >>>> alot
>> >>> >> >>>> data
>> >>> >> >>>>       =   > generated.  and server can not afford the load then
>> >>> >> >>>> crashed.
>> >>> >> >>>>       =   > after come back, node 3 can not return for more than
>> >>> >> >>>> 96
>> >>> >> >>>> hours
>> >>> >> >>>>       =   >
>> >>> >> >>>>       =   > for 34GB data, the node 2 could restart and back
>> >>> >> >>>> online
>> >>> >> >>>> within 1
>> >>> >> >>>> hour.
>> >>> >> >>>>       =   >
>> >>> >> >>>>       =   > I am not sure what's wrong with node3 and should I
>> >>> >> >>>> restart
>> >>> >> >>>> node
>> >>> >> >>>> 3 again?
>> >>> >> >>>>       =   > thanks!
>> >>> >> >>>>       =   >
>> >>> >> >>>>       =   > Address         Status State   Load =            Owns
>> >>> >> >>>>  Token
>> >>> >> >>>>       =   >
>> >>> >> >>>>       =   > 113427455640312821154458202477256070484
>> >>> >> >>>>       =   > node1     Up     Normal  34.11 GB =        33.33%  0
>> >>> >> >>>>       =   > node2     Up     Normal  31.44 GB =        33.33%
>> >>> >> >>>>       =   > 56713727820156410577229101238628035242
>> >>> >> >>>>       =   > node3     Down   Normal  177.55 GB =       33.33%
>> >>> >> >>>>       =   > 113427455640312821154458202477256070484
>> >>> >> >>>>       =   >
>> >>> >> >>>>       =   >
>> >>> >> >>>>       =   > the log shows it is still going on, not sure why it
>> >>> >> >>>> is
>> >>> >> >>>> so
>> >>> >> >>>> slow:
>> >>> >> >>>>       =   >
>> >>> >> >>>>       =   >
>> >>> >> >>>>       =   >  INFO [main] 2011-08-14 08:55:47,734
>> >>> >> >>>> SSTableReader.java
>> >>> >> >>>> (line
>> >>> >> >>>> 154)
>> >>> >> >>>>       =  Opening
>> >>> >> >>>>       =   > /cassandra/data/COMMENT
>> >>> >> >>>>       =   >  INFO [main] 2011-08-14 08:55:47,828
>> >>> >> >>>> = ColumnFamilyStore.java
>> >>> >> >>>> (line 275)
>> >>> >> >>>>       =   > reading saved cache
>> >>> >> >>>> = /cassandra/saved_caches/COMMENT-RowCache
>> >>> >> >>>>       =   >  INFO [main] 2011-08-14 09:24:52,198
>> >>> >> >>>> = ColumnFamilyStore.java
>> >>> >> >>>> (line 547)
>> >>> >> >>>>       =   > completed loading (1744370 ms; 200000 keys) row
>> >>> >> >>>> cache
>> >>> >> >>>> for
>> >>> >> >>>> COMMENT
>> >>> >> >>>>       =   >  INFO [main] 2011-08-14 09:24:52,299
>> >>> >> >>>> = ColumnFamilyStore.java
>> >>> >> >>>> (line 275)
>> >>> >> >>>>       =   > reading saved cache
>> >>> >> >>>> = /cassandra/saved_caches/COMMENT-RowCache
>> >>> >> >>>>       =   >  INFO [CompactionExecutor:1] 2011-08-14 = 10:24:55,480
>> >>> >> >>>>       =  CacheWriter.java (line
>> >>> >> >>>>       =   > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms
>> >>> >> >>>>       =   >
>> >>> >> >>>>       =   >
>> >>> >> >>>>       =   >
>> >>> >> >>>>       =   >
>> >>> >> >>>>       =   >
>> >>> >> >>>>       =   >
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>>       =  --
>> >>> >> >>>>       =  Jonathan Ellis
>> >>> >> >>>>       =  Project Chair, Apache Cassandra
>> >>> >> >>>>       =  co-founder of DataStax, the source for professional
>> >>> >> >>>> Cassandra
>> >>> >> >>>> support
>> >>> >> >>>>       =  http://www.datastax.com
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>
>> >>> >> >>>
>> >>> >> >
>> >>> >> >
>> >>> >
>> >>> >
>> >>>
>> >>
>> >>
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra = support
>> http://www.datastax.com
>
>



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra = support
http://www.datastax.com

=



= --Apple-Mail=_B26C2FFD-5CD5-4DB1-B91F-376864A02AB7--