incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yan Chunlu <springri...@gmail.com>
Subject Re: node restart taking too long
Date Thu, 18 Aug 2011 11:25:23 GMT
just found out that changes via cassandra-cli, the schema change didn't
reach node2. and node2 became unreachable....

I did as this document:
http://wiki.apache.org/cassandra/FAQ#schema_disagreement

but after that I just got two schema versons:



ddcada52-c96a-11e0-99af-3bd951658d61: [node1, node3]
 2127b2ef-6998-11e0-b45b-3bd951658d61: [node2]


is that enough delete Schema* && Migrations* sstables and restart the node?



On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu <springrider@gmail.com> wrote:

> thanks a lot for  all the help!  I have gone through the steps and
> successfully brought up the node2 :)
>
>
> On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen <yulinyen@gmail.com> wrote:
> > Because the file only preserve the "key" of records, not the whole
> record.
> > Records for those saved key will be loaded into cassandra during the
> startup
> > of cassandra.
> >
> > On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu <springrider@gmail.com>
> wrote:
> >>
> >> but the data size in the saved_cache are relatively small:
> >>
> >> will that cause the load problem?
> >>
> >>  ls  -lh  /cassandra/saved_caches/
> >> total 32M
> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53
> >> cass-CommentSortsCache-KeyCache
> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29
> >> cass-CommentSortsCache-RowCache
> >> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50 cass-CommentVote-KeyCache
> >> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53 cass-device_images-KeyCache
> >> -rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
> >> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 cass-images-KeyCache
> >> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53 cass-LinksByUrl-KeyCache
> >> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50 cass-LinkVote-KeyCache
> >> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache
> >> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache
> >> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
> >> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50
> cass-SavesByAccount-KeyCache
> >> -rw-r--r-- 1 cass cass  864 2011-08-12 19:49 cass-VotesByDay-KeyCache
> >> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49 cass-VotesByLink-KeyCache
> >> -rw-r--r-- 1 cass cass   28 2011-08-14 12:50
> >> system-HintsColumnFamily-KeyCache
> >> -rw-r--r-- 1 cass cass    5 2011-08-14 12:50
> system-LocationInfo-KeyCache
> >> -rw-r--r-- 1 cass cass   54 2011-08-13 13:30 system-Migrations-KeyCache
> >> -rw-r--r-- 1 cass cass   76 2011-08-13 13:30 system-Schema-KeyCache
> >>
> >> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton <aaron@thelastpickle.com>
> >> wrote:
> >> > If you have a node that cannot start up due to issues loading the
> saved
> >> > cache delete the files in the saved_cache directory before starting
> it.
> >> >
> >> > The settings to save the row and key cache are per CF. You can change
> >> > them with an update column family statement via the CLI when attached
> to any
> >> > node. You may then want to check the saved_caches directory and delete
> any
> >> > files that are left (not sure if they are automatically deleted).
> >> >
> >> > i would recommend:
> >> > - stop node 2
> >> > - delete it's saved_cache
> >> > - make the schema change via another node
> >> > - startup node 2
> >> >
> >> > Cheers
> >> >
> >> > -----------------
> >> > Aaron Morton
> >> > Freelance Cassandra Developer
> >> > @aaronmorton
> >> > http://www.thelastpickle.com
> >> >
> >> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
> >> >
> >> >> does this need to be cluster wide? or I could just modify the caches
> >> >> on one node?   since I could not connect to the node with
> >> >> cassandra-cli, it says "connection refused"
> >> >>
> >> >>
> >> >> [default@unknown] connect node2/9160;
> >> >> Exception connecting to node2/9160. Reason: Connection refused.
> >> >>
> >> >>
> >> >> so if I change the cache size via other nodes, how could node2 be
> >> >> notified the changing?    kill cassandra and start it again could
> make
> >> >> it update the schema?
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer <tholzer@wetafx.co.nz>
> >> >> wrote:
> >> >>> Hi,
> >> >>>
> >> >>> yes, we saw exactly the same messages. We got rid of these by doing
> >> >>> the
> >> >>> following:
> >> >>>
> >> >>> * Set all row & key caches in your CFs to 0 via cassandra-cli
> >> >>> * Kill Cassandra
> >> >>> * Remove all files in the saved_caches directory
> >> >>> * Start Cassandra
> >> >>> * Slowly bring back row & key caches (if desired, we left them
off)
> >> >>>
> >> >>> Cheers,
> >> >>>
> >> >>>        T.
> >> >>>
> >> >>> On 16/08/11 23:35, Yan Chunlu wrote:
> >> >>>>
> >> >>>>  I saw alot slicequeryfilter things if changed the log level
to
> >> >>>> DEBUG.
> >> >>>>  just
> >> >>>> thought even bring up a new node will be faster than start
the old
> >> >>>> one..... it
> >> >>>> is wired
> >> >>>>
> >> >>>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java
(line
> 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:225@1313068845474382
> >> >>>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java
(line
> 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:453@1310999270198313
> >> >>>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java
(line
> 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:26@1313199902088827
> >> >>>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java
(line
> 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:157@1313097239332314
> >> >>>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java
(line
> 123)
> >> >>>> collecting 0 of 2147483647:
> 76616c7565:false:41729@1313190821826229
> >> >>>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java
(line
> 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:6@1313174157301203
> >> >>>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java
(line
> 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:98@1312011362250907
> >> >>>> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java
(line
> 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:42@1313201711997005
> >> >>>> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java
(line
> 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:96@1312939986190155
> >> >>>> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java
(line
> 123)
> >> >>>> collecting 0 of 2147483647: 76616c7565:false:621@1313192538616112
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu <springrider@gmail.com
> >> >>>> <mailto:springrider@gmail.com>> wrote:
> >> >>>>
> >> >>>>    but it seems the row cache is cluster wide, how will  the
change
> >> >>>> of row
> >> >>>>    cache affect the read speed?
> >> >>>>
> >> >>>>
> >> >>>>    On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis <
> jbellis@gmail.com
> >> >>>>    <mailto:jbellis@gmail.com>> wrote:
> >> >>>>
> >> >>>>        Or leave row cache enabled but disable cache saving
(and
> >> >>>> remove the
> >> >>>>        one already on disk).
> >> >>>>
> >> >>>>        On Sun, Aug 14, 2011 at 5:05 PM, aaron morton
> >> >>>> <aaron@thelastpickle.com
> >> >>>>        <mailto:aaron@thelastpickle.com>> wrote:
> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
> ColumnFamilyStore.java
> >> >>>> (line 547)
> >> >>>>         > completed loading (1744370 ms; 200000 keys) row
cache for
> >> >>>> COMMENT
> >> >>>>         >
> >> >>>>         > It's taking 29 minutes to load 200,000 rows in
the  row
> >> >>>> cache.
> >> >>>> Thats a
> >> >>>>         > pretty big row cache, I would suggest reducing
or
> disabling
> >> >>>> it.
> >> >>>>         > Background
> >> >>>>
> >> >>>>
> >> >>>>
> http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
> >> >>>>         >
> >> >>>>         > and server can not afford the load then crashed.
after
> come
> >> >>>> back,
> >> >>>>        node 3 can
> >> >>>>         > not return for more than 96 hours
> >> >>>>         >
> >> >>>>         > Crashed how ?
> >> >>>>         > You may be seeing
> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-2280
> >> >>>>         > Watch nodetool compactionstats to see when the
Merkle
> tree
> >> >>>> build
> >> >>>>        finishes
> >> >>>>         > and nodetool netstats to see which CF's are streaming.
> >> >>>>         > Cheers
> >> >>>>         > -----------------
> >> >>>>         > Aaron Morton
> >> >>>>         > Freelance Cassandra Developer
> >> >>>>         > @aaronmorton
> >> >>>>         > http://www.thelastpickle.com
> >> >>>>         > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
> >> >>>>         >
> >> >>>>         >
> >> >>>>         > I got 3 nodes and RF=3, when I repairing ndoe3,
it seems
> >> >>>> alot
> >> >>>> data
> >> >>>>         > generated.  and server can not afford the load
then
> >> >>>> crashed.
> >> >>>>         > after come back, node 3 can not return for more
than 96
> >> >>>> hours
> >> >>>>         >
> >> >>>>         > for 34GB data, the node 2 could restart and back
online
> >> >>>> within 1
> >> >>>> hour.
> >> >>>>         >
> >> >>>>         > I am not sure what's wrong with node3 and should
I
> restart
> >> >>>> node
> >> >>>> 3 again?
> >> >>>>         > thanks!
> >> >>>>         >
> >> >>>>         > Address         Status State   Load          
 Owns
> >> >>>>  Token
> >> >>>>         >
> >> >>>>         > 113427455640312821154458202477256070484
> >> >>>>         > node1     Up     Normal  34.11 GB        33.33%
 0
> >> >>>>         > node2     Up     Normal  31.44 GB        33.33%
> >> >>>>         > 56713727820156410577229101238628035242
> >> >>>>         > node3     Down   Normal  177.55 GB       33.33%
> >> >>>>         > 113427455640312821154458202477256070484
> >> >>>>         >
> >> >>>>         >
> >> >>>>         > the log shows it is still going on, not sure why
it is so
> >> >>>> slow:
> >> >>>>         >
> >> >>>>         >
> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java
> >> >>>> (line
> >> >>>> 154)
> >> >>>>        Opening
> >> >>>>         > /cassandra/data/COMMENT
> >> >>>>         >  INFO [main] 2011-08-14 08:55:47,828
> ColumnFamilyStore.java
> >> >>>> (line 275)
> >> >>>>         > reading saved cache
> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,198
> ColumnFamilyStore.java
> >> >>>> (line 547)
> >> >>>>         > completed loading (1744370 ms; 200000 keys) row
cache for
> >> >>>> COMMENT
> >> >>>>         >  INFO [main] 2011-08-14 09:24:52,299
> ColumnFamilyStore.java
> >> >>>> (line 275)
> >> >>>>         > reading saved cache
> >> >>>> /cassandra/saved_caches/COMMENT-RowCache
> >> >>>>         >  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480
> >> >>>>        CacheWriter.java (line
> >> >>>>         > 96) Saved COMMENT-RowCache (200000 items) in 2535
ms
> >> >>>>         >
> >> >>>>         >
> >> >>>>         >
> >> >>>>         >
> >> >>>>         >
> >> >>>>         >
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>        --
> >> >>>>        Jonathan Ellis
> >> >>>>        Project Chair, Apache Cassandra
> >> >>>>        co-founder of DataStax, the source for professional
> Cassandra
> >> >>>> support
> >> >>>>        http://www.datastax.com
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>
> >> >>>
> >> >
> >> >
> >
> >
>
>

Mime
View raw message