cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yan Chunlu <springri...@gmail.com>
Subject Re: node restart taking too long
Date Mon, 22 Aug 2011 01:33:30 GMT
if I removed migration and schema sstables, it will show up "Couldn't find
cfId=1000", as I remember, If I leave the error alone, it finally will show
up "InstanceAlreadyExistsException".
I found the log in the cassandra log file(but I could not reproduce it), it
was like this:

ERROR [MutationStage:2834] 2011-08-18 06:30:56,667
DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
java.lang.RuntimeException: javax.management.InstanceAlreadyExistsException:
org.apache.cassandra.db:type=ColumnFamilies,keyspace=prjspace,columnfamily=FriendsByAccount
    at
org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:261)
    at
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:472)
    at
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:453)
    at org.apache.cassandra.db.Table.initCf(Table.java:317)
    at org.apache.cassandra.db.Table.<init>(Table.java:254)
    at org.apache.cassandra.db.Table.open(Table.java:110)
    at
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:76)
    at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:636)
Caused by: javax.management.InstanceAlreadyExistsException:
org.apache.cassandra.db:type=ColumnFamilies,keyspace=prjspace,columnfamily=FriendsByAccount
    at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:467)
    at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.internal_addObject(DefaultMBeanServerInterceptor.java:1520)
    at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:986)
    at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:938)
    at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:330)
    at
com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:516)
    at
org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:257)








On Mon, Aug 22, 2011 at 5:42 AM, aaron morton <aaron@thelastpickle.com>wrote:

> cf already exists is not the same.
>
> Would need the call stack.
>
> Cheers
>
>  -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 22/08/2011, at 1:03 AM, Yan Chunlu wrote:
>
> is that means I could just wait and it will be okay eventually?
>
> I also saw the "column family already exists"(not accurate, something like
> that) Exception, also caused after I delete the migration and schema
> sstables.   but I can not reproduce it, is that a similar problem?
>
> On Sun, Aug 21, 2011 at 7:57 PM, aaron morton <aaron@thelastpickle.com>wrote:
>
>> I've seen "Couldn't find cfId=1000" in a mutation stage happen when a node
>> joins a cluster with existing data after having it's schema cleared.
>>
>> The migrations received from another node are applied one CF at a time,
>> when each CF is added the node will open the existing data files which can
>> take a while. In the mean time it's joined on gossip and is receiving
>> mutations from other nodes that have all the CF's. One the returning node
>> gets through applying the migration the errors should stop.
>>
>> Read is a similar story.
>>
>> Cheers
>>
>>
>>
>>  -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 21/08/2011, at 8:58 PM, Yan Chunlu wrote:
>>
>> actually I didn't dropped any CF,  maybe my understanding was totally
>> wrong, I just describe what I thought as belows:
>>
>> I thought by "deleted CFs" means the sstable that useless(since "node
>> repair" and could copy data to another node,  the original sstable might be
>> deleted but not yet).  when I deleted all migration and schema sstables, it
>> somehow "forgot" those files should be deleted, so it read the file and "can
>> not find cfId"...
>>
>>
>> I got to this situation by the following steps: at first I did "node
>> repair" on node2 which failed in the middle(node3 down), and leave the Load
>> as 170GB while average is 30GB.
>>
>> after I brought up node3,  the node2 start up very slow, 4 days past it
>> stil starting.  it seems loading row cache and key cache.  so I disabled
>> those cache by set the value to 0 via cassandra-cli. during this procedure,
>> of course node2 was not reachable so it can not update the schema.
>>
>> after that node2 could be start very quickly, but the "describe cluster"
>> shows it was "UNREACHABLE", so I did as the FAQ says, delete schema,
>> migration sstables and restart node2.
>>
>> then the "Couldn't find cfId=1000'" error start showing up.
>>
>>
>>
>>
>>
>> I have just moved those migration && schema sstables back and start
>> cassandra, it still shows "UNREACHABLE", after wait for couple of hours, the
>> "describe cluster" shows they are the same version now.
>>
>>
>> even this problem solved, I am not sure HOW....... really curious that why
>> just remove "migration* and schema*" sstables could cause  "Couldn't find
>> cfId=1000'"  error.
>>
>> On Sun, Aug 21, 2011 at 12:24 PM, Jonathan Ellis <jbellis@gmail.com>wrote:
>>
>>> I'm not sure what problem you're trying to solve.  The exception you
>>> pasted should stop once your clients are no longer trying to use the
>>> dropped CF.
>>>
>>> On Sat, Aug 20, 2011 at 10:09 PM, Yan Chunlu <springrider@gmail.com>
>>> wrote:
>>> > that could be the reason, I did nodetool repair(unfinished, data size
>>> > increased 6 times bigger 30G vs 170G) and there should be some unclean
>>> > sstables on that node.
>>> > however upgrade it a tough work for me right now.  could the nodetool
>>> scrub
>>> > help?  or decommission the node and join it again?
>>> >
>>> > On Sun, Aug 21, 2011 at 5:56 AM, Jonathan Ellis <jbellis@gmail.com>
>>> wrote:
>>> >>
>>> >> This means you should upgrade, because we've fixed bugs about ignoring
>>> >> deleted CFs since 0.7.4.
>>> >>
>>> >> On Fri, Aug 19, 2011 at 9:26 AM, Yan Chunlu <springrider@gmail.com>
>>> wrote:
>>> >> > the log file shows as follows, not sure what does 'Couldn't find
>>> >> > cfId=1000'
>>> >> > means(google just returned useless results):
>>> >> >
>>> >> > INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line
>>> 453)
>>> >> > Found
>>> >> > table data in data directories. Consider using JMX to call
>>> >> > org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
>>> >> >  INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line
50)
>>> >> > Creating new commitlog segment
>>> >> > /cassandra/commitlog/CommitLog-1313670197705.log
>>> >> >  INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155)
>>> Replaying
>>> >> > /cassandra/commitlog/CommitLog-1313670030512.log
>>> >> >  INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314)
>>> Finished
>>> >> > reading /cassandra/commitlog/CommitLog-1313670030512.log
>>> >> >  INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163)
Log
>>> >> > replay
>>> >> > complete
>>> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line
364)
>>> >> > Cassandra version: 0.7.4
>>> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line
365)
>>> >> > Thrift
>>> >> > API version: 19.4.0
>>> >> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line
378)
>>> >> > Loading
>>> >> > persisted ring state
>>> >> >  INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line
414)
>>> >> > Starting
>>> >> > up server gossip
>>> >> >  INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line
>>> 1048)
>>> >> > Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1
>>> >> > operations)
>>> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line
>>> 157)
>>> >> > Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations)
>>> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line
>>> 164)
>>> >> > Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db
>>> (80
>>> >> > bytes)
>>> >> >  INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823
>>> >> > CompactionManager.java
>>> >> > (line 396) Compacting
>>> >> >
>>> >> >
>>> [SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')]
>>> >> >  INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line
478)
>>> >> > Using
>>> >> > saved token 113427455640312821154458202477256070484
>>> >> >  INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line
>>> 1048)
>>> >> > Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2
>>> >> > operations)
>>> >> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line
>>> 157)
>>> >> > Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations)
>>> >> > ERROR [MutationStage:28] 2011-08-18 07:23:18,246
>>> >> > RowMutationVerbHandler.java
>>> >> > (line 86) Error in row mutation
>>> >> > org.apache.cassandra.db.UnserializableColumnFamilyException:
>>> Couldn't
>>> >> > find
>>> >> > cfId=1000
>>> >> >     at
>>> >> >
>>> >> >
>>> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117)
>>> >> >     at
>>> >> >
>>> >> >
>>> org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380)
>>> >> >     at
>>> >> >
>>> >> >
>>> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50)
>>> >> >     at
>>> >> >
>>> >> >
>>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
>>> >> >     at
>>> >> >
>>> >> >
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>> >> >     at
>>> >> >
>>> >> >
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>> >> >     at java.lang.Thread.run(Thread.java:636)
>>> >> >  INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line
>>> 623)
>>> >> > Node
>>> >> > /node1 has restarted, now UP again
>>> >> > ERROR [ReadStage:1] 2011-08-18 07:23:18,254
>>> >> > DebuggableThreadPoolExecutor.java (line 103) Error in
>>> ThreadPoolExecutor
>>> >> > java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache
in
>>> >> > keyspace prjkeyspace
>>> >> >     at
>>> >> >
>>> >> >
>>> org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966)
>>> >> >     at
>>> >> >
>>> >> >
>>> org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:388)
>>> >> >     at
>>> >> >
>>> org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93)
>>> >> >     at
>>> >> >
>>> >> >
>>> org.apache.cassandra.db.SliceByNamesReadCommand.<init>(SliceByNamesReadCommand.java:44)
>>> >> >     at
>>> >> >
>>> >> >
>>> org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:110)
>>> >> >     at
>>> >> >
>>> >> >
>>> org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:122)
>>> >> >     at
>>> >> >
>>> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67)
>>> >> >
>>> >> >
>>> >> > On Fri, Aug 19, 2011 at 5:44 AM, aaron morton <
>>> aaron@thelastpickle.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> Look in the logs to work find out why the migration did not
get to
>>> >> >> node2.
>>> >> >> Otherwise yes you can drop those files.
>>> >> >> Cheers
>>> >> >> -----------------
>>> >> >> Aaron Morton
>>> >> >> Freelance Cassandra Developer
>>> >> >> @aaronmorton
>>> >> >> http://www.thelastpickle.com
>>> >> >> On 18/08/2011, at 11:25 PM, Yan Chunlu wrote:
>>> >> >>
>>> >> >> just found out that changes via cassandra-cli, the schema change
>>> didn't
>>> >> >> reach node2. and node2 became unreachable....
>>> >> >> I did as this
>>> >> >> document:http://wiki.apache.org/cassandra/FAQ#schema_disagreement
>>> >> >> but after that I just got two schema versons:
>>> >> >>
>>> >> >>
>>> >> >> ddcada52-c96a-11e0-99af-3bd951658d61: [node1, node3]
>>> >> >> 2127b2ef-6998-11e0-b45b-3bd951658d61: [node2]
>>> >> >>
>>> >> >> is that enough delete Schema* && Migrations* sstables
and restart
>>> the
>>> >> >> node?
>>> >> >>
>>> >> >>
>>> >> >> On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu <springrider@gmail.com
>>> >
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> thanks a lot for  all the help!  I have gone through the
steps and
>>> >> >>> successfully brought up the node2 :)
>>> >> >>>
>>> >> >>> On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen <yulinyen@gmail.com>
>>> >> >>> wrote:
>>> >> >>> > Because the file only preserve the "key" of records,
not the
>>> whole
>>> >> >>> > record.
>>> >> >>> > Records for those saved key will be loaded into cassandra
during
>>> the
>>> >> >>> > startup
>>> >> >>> > of cassandra.
>>> >> >>> >
>>> >> >>> > On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu <
>>> springrider@gmail.com>
>>> >> >>> > wrote:
>>> >> >>> >>
>>> >> >>> >> but the data size in the saved_cache are relatively
small:
>>> >> >>> >>
>>> >> >>> >> will that cause the load problem?
>>> >> >>> >>
>>> >> >>> >>  ls  -lh  /cassandra/saved_caches/
>>> >> >>> >> total 32M
>>> >> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53
>>> >> >>> >> cass-CommentSortsCache-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29
>>> >> >>> >> cass-CommentSortsCache-RowCache
>>> >> >>> >> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50
>>> >> >>> >> cass-CommentVote-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53
>>> >> >>> >> cass-device_images-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53
>>> cass-images-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53
>>> >> >>> >> cass-LinksByUrl-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50
>>> cass-LinkVote-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50
>>> cass-cache-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51
>>> cass-cache-RowCache
>>> >> >>> >> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50
>>> >> >>> >> cass-SavesByAccount-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass  864 2011-08-12 19:49
>>> >> >>> >> cass-VotesByDay-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49
>>> >> >>> >> cass-VotesByLink-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass   28 2011-08-14 12:50
>>> >> >>> >> system-HintsColumnFamily-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass    5 2011-08-14 12:50
>>> >> >>> >> system-LocationInfo-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass   54 2011-08-13 13:30
>>> >> >>> >> system-Migrations-KeyCache
>>> >> >>> >> -rw-r--r-- 1 cass cass   76 2011-08-13 13:30
>>> system-Schema-KeyCache
>>> >> >>> >>
>>> >> >>> >> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton
>>> >> >>> >> <aaron@thelastpickle.com>
>>> >> >>> >> wrote:
>>> >> >>> >> > If you have a node that cannot start up due
to issues loading
>>> the
>>> >> >>> >> > saved
>>> >> >>> >> > cache delete the files in the saved_cache
directory before
>>> >> >>> >> > starting
>>> >> >>> >> > it.
>>> >> >>> >> >
>>> >> >>> >> > The settings to save the row and key cache
are per CF. You
>>> can
>>> >> >>> >> > change
>>> >> >>> >> > them with an update column family statement
via the CLI when
>>> >> >>> >> > attached to any
>>> >> >>> >> > node. You may then want to check the saved_caches
directory
>>> and
>>> >> >>> >> > delete any
>>> >> >>> >> > files that are left (not sure if they are
automatically
>>> deleted).
>>> >> >>> >> >
>>> >> >>> >> > i would recommend:
>>> >> >>> >> > - stop node 2
>>> >> >>> >> > - delete it's saved_cache
>>> >> >>> >> > - make the schema change via another node
>>> >> >>> >> > - startup node 2
>>> >> >>> >> >
>>> >> >>> >> > Cheers
>>> >> >>> >> >
>>> >> >>> >> > -----------------
>>> >> >>> >> > Aaron Morton
>>> >> >>> >> > Freelance Cassandra Developer
>>> >> >>> >> > @aaronmorton
>>> >> >>> >> > http://www.thelastpickle.com
>>> >> >>> >> >
>>> >> >>> >> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
>>> >> >>> >> >
>>> >> >>> >> >> does this need to be cluster wide? or
I could just modify
>>> the
>>> >> >>> >> >> caches
>>> >> >>> >> >> on one node?   since I could not connect
to the node with
>>> >> >>> >> >> cassandra-cli, it says "connection refused"
>>> >> >>> >> >>
>>> >> >>> >> >>
>>> >> >>> >> >> [default@unknown] connect node2/9160;
>>> >> >>> >> >> Exception connecting to node2/9160. Reason:
Connection
>>> refused.
>>> >> >>> >> >>
>>> >> >>> >> >>
>>> >> >>> >> >> so if I change the cache size via other
nodes, how could
>>> node2
>>> >> >>> >> >> be
>>> >> >>> >> >> notified the changing?    kill cassandra
and start it again
>>> >> >>> >> >> could
>>> >> >>> >> >> make
>>> >> >>> >> >> it update the schema?
>>> >> >>> >> >>
>>> >> >>> >> >>
>>> >> >>> >> >>
>>> >> >>> >> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo
Holzer
>>> >> >>> >> >> <tholzer@wetafx.co.nz>
>>> >> >>> >> >> wrote:
>>> >> >>> >> >>> Hi,
>>> >> >>> >> >>>
>>> >> >>> >> >>> yes, we saw exactly the same messages.
We got rid of these
>>> by
>>> >> >>> >> >>> doing
>>> >> >>> >> >>> the
>>> >> >>> >> >>> following:
>>> >> >>> >> >>>
>>> >> >>> >> >>> * Set all row & key caches in
your CFs to 0 via
>>> cassandra-cli
>>> >> >>> >> >>> * Kill Cassandra
>>> >> >>> >> >>> * Remove all files in the saved_caches
directory
>>> >> >>> >> >>> * Start Cassandra
>>> >> >>> >> >>> * Slowly bring back row & key
caches (if desired, we left
>>> them
>>> >> >>> >> >>> off)
>>> >> >>> >> >>>
>>> >> >>> >> >>> Cheers,
>>> >> >>> >> >>>
>>> >> >>> >> >>>        T.
>>> >> >>> >> >>>
>>> >> >>> >> >>> On 16/08/11 23:35, Yan Chunlu wrote:
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>  I saw alot slicequeryfilter
things if changed the log
>>> level
>>> >> >>> >> >>>> to
>>> >> >>> >> >>>> DEBUG.
>>> >> >>> >> >>>>  just
>>> >> >>> >> >>>> thought even bring up a new node
will be faster than start
>>> the
>>> >> >>> >> >>>> old
>>> >> >>> >> >>>> one..... it
>>> >> >>> >> >>>> is wired
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,213
SliceQueryFilter.java
>>> >> >>> >> >>>> (line
>>> >> >>> >> >>>> 123)
>>> >> >>> >> >>>> collecting 0 of 2147483647:
>>> >> >>> >> >>>> 76616c7565:false:225@1313068845474382
>>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,245
SliceQueryFilter.java
>>> >> >>> >> >>>> (line
>>> >> >>> >> >>>> 123)
>>> >> >>> >> >>>> collecting 0 of 2147483647:
>>> >> >>> >> >>>> 76616c7565:false:453@1310999270198313
>>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,251
SliceQueryFilter.java
>>> >> >>> >> >>>> (line
>>> >> >>> >> >>>> 123)
>>> >> >>> >> >>>> collecting 0 of 2147483647:
>>> >> >>> >> >>>> 76616c7565:false:26@1313199902088827
>>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:49,576
SliceQueryFilter.java
>>> >> >>> >> >>>> (line
>>> >> >>> >> >>>> 123)
>>> >> >>> >> >>>> collecting 0 of 2147483647:
>>> >> >>> >> >>>> 76616c7565:false:157@1313097239332314
>>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,674
SliceQueryFilter.java
>>> >> >>> >> >>>> (line
>>> >> >>> >> >>>> 123)
>>> >> >>> >> >>>> collecting 0 of 2147483647:
>>> >> >>> >> >>>> 76616c7565:false:41729@1313190821826229
>>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,811
SliceQueryFilter.java
>>> >> >>> >> >>>> (line
>>> >> >>> >> >>>> 123)
>>> >> >>> >> >>>> collecting 0 of 2147483647:
>>> >> >>> >> >>>> 76616c7565:false:6@1313174157301203
>>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,867
SliceQueryFilter.java
>>> >> >>> >> >>>> (line
>>> >> >>> >> >>>> 123)
>>> >> >>> >> >>>> collecting 0 of 2147483647:
>>> >> >>> >> >>>> 76616c7565:false:98@1312011362250907
>>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,881
SliceQueryFilter.java
>>> >> >>> >> >>>> (line
>>> >> >>> >> >>>> 123)
>>> >> >>> >> >>>> collecting 0 of 2147483647:
>>> >> >>> >> >>>> 76616c7565:false:42@1313201711997005
>>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,910
SliceQueryFilter.java
>>> >> >>> >> >>>> (line
>>> >> >>> >> >>>> 123)
>>> >> >>> >> >>>> collecting 0 of 2147483647:
>>> >> >>> >> >>>> 76616c7565:false:96@1312939986190155
>>> >> >>> >> >>>> DEBUG [main] 2011-08-16 06:32:50,954
SliceQueryFilter.java
>>> >> >>> >> >>>> (line
>>> >> >>> >> >>>> 123)
>>> >> >>> >> >>>> collecting 0 of 2147483647:
>>> >> >>> >> >>>> 76616c7565:false:621@1313192538616112
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>> On Tue, Aug 16, 2011 at 7:32
PM, Yan Chunlu
>>> >> >>> >> >>>> <springrider@gmail.com
>>> >> >>> >> >>>> <mailto:springrider@gmail.com>>
wrote:
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>    but it seems the row cache
is cluster wide, how will
>>>  the
>>> >> >>> >> >>>> change
>>> >> >>> >> >>>> of row
>>> >> >>> >> >>>>    cache affect the read speed?
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>    On Mon, Aug 15, 2011 at 7:33
AM, Jonathan Ellis
>>> >> >>> >> >>>> <jbellis@gmail.com
>>> >> >>> >> >>>>    <mailto:jbellis@gmail.com>>
wrote:
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>        Or leave row cache enabled
but disable cache saving
>>> >> >>> >> >>>> (and
>>> >> >>> >> >>>> remove the
>>> >> >>> >> >>>>        one already on disk).
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>        On Sun, Aug 14, 2011 at
5:05 PM, aaron morton
>>> >> >>> >> >>>> <aaron@thelastpickle.com
>>> >> >>> >> >>>>        <mailto:aaron@thelastpickle.com>>
wrote:
>>> >> >>> >> >>>>         >  INFO [main] 2011-08-14
09:24:52,198
>>> >> >>> >> >>>> ColumnFamilyStore.java
>>> >> >>> >> >>>> (line 547)
>>> >> >>> >> >>>>         > completed loading
(1744370 ms; 200000 keys) row
>>> >> >>> >> >>>> cache
>>> >> >>> >> >>>> for
>>> >> >>> >> >>>> COMMENT
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         > It's taking 29 minutes
to load 200,000 rows in
>>> the
>>> >> >>> >> >>>>  row
>>> >> >>> >> >>>> cache.
>>> >> >>> >> >>>> Thats a
>>> >> >>> >> >>>>         > pretty big row cache,
I would suggest reducing
>>> or
>>> >> >>> >> >>>> disabling
>>> >> >>> >> >>>> it.
>>> >> >>> >> >>>>         > Background
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         > and server can not
afford the load then crashed.
>>> >> >>> >> >>>> after
>>> >> >>> >> >>>> come
>>> >> >>> >> >>>> back,
>>> >> >>> >> >>>>        node 3 can
>>> >> >>> >> >>>>         > not return for more
than 96 hours
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         > Crashed how ?
>>> >> >>> >> >>>>         > You may be seeing
>>> >> >>> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280
>>> >> >>> >> >>>>         > Watch nodetool compactionstats
to see when the
>>> >> >>> >> >>>> Merkle
>>> >> >>> >> >>>> tree
>>> >> >>> >> >>>> build
>>> >> >>> >> >>>>        finishes
>>> >> >>> >> >>>>         > and nodetool netstats
to see which CF's are
>>> >> >>> >> >>>> streaming.
>>> >> >>> >> >>>>         > Cheers
>>> >> >>> >> >>>>         > -----------------
>>> >> >>> >> >>>>         > Aaron Morton
>>> >> >>> >> >>>>         > Freelance Cassandra
Developer
>>> >> >>> >> >>>>         > @aaronmorton
>>> >> >>> >> >>>>         > http://www.thelastpickle.com
>>> >> >>> >> >>>>         > On 15 Aug 2011,
at 04:23, Yan Chunlu wrote:
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         > I got 3 nodes and
RF=3, when I repairing ndoe3,
>>> it
>>> >> >>> >> >>>> seems
>>> >> >>> >> >>>> alot
>>> >> >>> >> >>>> data
>>> >> >>> >> >>>>         > generated.  and
server can not afford the load
>>> then
>>> >> >>> >> >>>> crashed.
>>> >> >>> >> >>>>         > after come back,
node 3 can not return for more
>>> than
>>> >> >>> >> >>>> 96
>>> >> >>> >> >>>> hours
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         > for 34GB data, the
node 2 could restart and back
>>> >> >>> >> >>>> online
>>> >> >>> >> >>>> within 1
>>> >> >>> >> >>>> hour.
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         > I am not sure what's
wrong with node3 and should
>>> I
>>> >> >>> >> >>>> restart
>>> >> >>> >> >>>> node
>>> >> >>> >> >>>> 3 again?
>>> >> >>> >> >>>>         > thanks!
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         > Address        
Status State   Load
>>>  Owns
>>> >> >>> >> >>>>  Token
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         > 113427455640312821154458202477256070484
>>> >> >>> >> >>>>         > node1     Up   
 Normal  34.11 GB        33.33%
>>>  0
>>> >> >>> >> >>>>         > node2     Up   
 Normal  31.44 GB        33.33%
>>> >> >>> >> >>>>         > 56713727820156410577229101238628035242
>>> >> >>> >> >>>>         > node3     Down 
 Normal  177.55 GB       33.33%
>>> >> >>> >> >>>>         > 113427455640312821154458202477256070484
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         > the log shows it
is still going on, not sure why
>>> it
>>> >> >>> >> >>>> is
>>> >> >>> >> >>>> so
>>> >> >>> >> >>>> slow:
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         >  INFO [main] 2011-08-14
08:55:47,734
>>> >> >>> >> >>>> SSTableReader.java
>>> >> >>> >> >>>> (line
>>> >> >>> >> >>>> 154)
>>> >> >>> >> >>>>        Opening
>>> >> >>> >> >>>>         > /cassandra/data/COMMENT
>>> >> >>> >> >>>>         >  INFO [main] 2011-08-14
08:55:47,828
>>> >> >>> >> >>>> ColumnFamilyStore.java
>>> >> >>> >> >>>> (line 275)
>>> >> >>> >> >>>>         > reading saved cache
>>> >> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
>>> >> >>> >> >>>>         >  INFO [main] 2011-08-14
09:24:52,198
>>> >> >>> >> >>>> ColumnFamilyStore.java
>>> >> >>> >> >>>> (line 547)
>>> >> >>> >> >>>>         > completed loading
(1744370 ms; 200000 keys) row
>>> >> >>> >> >>>> cache
>>> >> >>> >> >>>> for
>>> >> >>> >> >>>> COMMENT
>>> >> >>> >> >>>>         >  INFO [main] 2011-08-14
09:24:52,299
>>> >> >>> >> >>>> ColumnFamilyStore.java
>>> >> >>> >> >>>> (line 275)
>>> >> >>> >> >>>>         > reading saved cache
>>> >> >>> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
>>> >> >>> >> >>>>         >  INFO [CompactionExecutor:1]
2011-08-14
>>> 10:24:55,480
>>> >> >>> >> >>>>        CacheWriter.java (line
>>> >> >>> >> >>>>         > 96) Saved COMMENT-RowCache
(200000 items) in
>>> 2535 ms
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>         >
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>        --
>>> >> >>> >> >>>>        Jonathan Ellis
>>> >> >>> >> >>>>        Project Chair, Apache
Cassandra
>>> >> >>> >> >>>>        co-founder of DataStax,
the source for professional
>>> >> >>> >> >>>> Cassandra
>>> >> >>> >> >>>> support
>>> >> >>> >> >>>>        http://www.datastax.com
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>>
>>> >> >>> >> >>>
>>> >> >>> >> >>>
>>> >> >>> >> >
>>> >> >>> >> >
>>> >> >>> >
>>> >> >>> >
>>> >> >>>
>>> >> >>
>>> >> >>
>>> >> >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Jonathan Ellis
>>> >> Project Chair, Apache Cassandra
>>> >> co-founder of DataStax, the source for professional Cassandra support
>>> >> http://www.datastax.com
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>>
>>
>>
>>
>
>

Mime
View raw message