incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: repair takes two days, and ends up stuck: stream at 1096% (yes, really)
Date Sun, 14 Nov 2010 22:17:25 GMT
What exception is causing it to fail/retry?

On Sun, Nov 14, 2010 at 3:49 PM, Chip Salzenberg <rev.chip@gmail.com> wrote:
> My by-now infamous eight-node cluster running 0.7.0beta3+ dropped many
> replication MUTATEs during load, so I decided to fix replication copies with
> a "nodetool repair" on one of the nodes (X.21).  The repair has been running
> for two days, and has finally gotten itself wedged into a state where it
> can't proceed.
> The log on X.21 continually describe the need to stream a data file,
> unsuccessfully.  From other clues below I gather this is a receive stream.
> This message repeated many many times, multiple per second, but has now
> stopped:
> INFO [Thread-13877] 2010-11-14 09:17:35,207 StreamInSession.java (line 124)
> Streaming of file
> /var/lib/cassandra/data/Attrs/TestAttrs-e-332-Data.db/(0,219682197079)
>          progress=90112/219682197079 - 0% from
> org.apache.cassandra.streaming.StreamInSession@3ee3da2c failed: requesting a
> retry.
> Here's the best joke, though: "nodetool -h X.20 nestats" shows that the
> given stream has been attempted a few times and is still being attempted,
> but in a broken way, such that the progress percentage has gone way past
> 100%.  It's now at 1096% and still rising.
> I'm not rebooting so I can poke around as devs suggest.  I'm also not
> sending logs to the list, at least in part because they're, well, big.  If
> any developers want them, though, I'm happy to send them.
> -------------------------------------------------------------------------------------------------------------------
> Mode: Normal
> Streaming to: /X.21
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-332-Data.db/(0,219682197079)
>          progress=2408587638800/219682197079 - 1096%
>         <---- see this
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-386-Data.db/(0,182528797)
>          progress=0/182528797 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-381-Data.db/(0,908075169)
>          progress=0/908075169 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-382-Data.db/(0,784362565)
>          progress=0/784362565 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-378-Data.db/(0,896956312)
>          progress=0/896956312 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-367-Data.db/(0,894019840)
>          progress=0/894019840 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-380-Data.db/(0,901377643)
>          progress=0/901377643 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-333-Data.db/(0,22306924)
>          progress=0/22306924 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-369-Data.db/(0,888814566)
>          progress=0/888814566 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-374-Data.db/(0,889095219)
>          progress=0/889095219 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-375-Data.db/(0,893034298)
>          progress=0/893034298 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-389-Data.db/(0,371718620)
>          progress=0/371718620 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-319-Data.db/(0,14172830870)
>          progress=0/14172830870 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-283-Data.db/(0,8939407316)
>          progress=0/8939407316 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-376-Data.db/(0,897417147)
>          progress=0/897417147 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-385-Data.db/(0,357220526)
>          progress=0/357220526 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-366-Data.db/(0,899103394)
>          progress=0/899103394 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-377-Data.db/(0,898165901)
>          progress=0/898165901 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-331-Data.db/(0,13323957368)
>          progress=0/13323957368 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-373-Data.db/(0,892116147)
>          progress=0/892116147 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-383-Data.db/(0,28216239303)
>          progress=0/28216239303 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-330-Data.db/(0,307921317)
>          progress=0/307921317 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-390-Data.db/(0,185277927)
>          progress=0/185277927 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-372-Data.db/(0,893683568)
>          progress=0/893683568 - 0%
> Streaming from: /X.21
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-440-Data.db/(0,176842211)
>          progress=0/176842211 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-385-Data.db/(0,447272883)
>          progress=0/447272883 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-412-Data.db/(0,444440243)
>          progress=0/444440243 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-328-Data.db/(0,14275850800)
>          progress=0/14275850800 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-397-Data.db/(0,31878407176)
>          progress=0/31878407176 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-393-Data.db/(0,446800028)
>          progress=0/446800028 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-439-Data.db/(0,367116560)
>          progress=0/367116560 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-390-Data.db/(0,445241132)
>          progress=0/445241132 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-271-Data.db/(0,4497953871)
>          progress=0/4497953871 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-396-Data.db/(0,449662908)
>          progress=0/449662908 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-409-Data.db/(0,454101872)
>          progress=0/454101872 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-381-Data.db/(0,447381444)
>          progress=0/447381444 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-327-Data.db/(0,208633237)
>          progress=0/208633237 - 0%
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0      709910236
> Responses                       n/a         0      363174385
> -------------------------------------------------------------------------------------------------------------------
>
> Meanwhile, "nodetool -h X.21 netstats" shows a large number of transfers
> that are at 0% and haven't moved, AFAICT, for at least an hour:
> -------------------------------------------------------------------------------------------------------------------
> Mode: Normal
> Streaming to: /X.20
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-327-Data.db/(0,208633237)
>          progress=0/208633237 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-439-Data.db/(0,367116560)
>          progress=0/367116560 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-271-Data.db/(0,4497953871)
>          progress=0/4497953871 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-412-Data.db/(0,444440243)
>          progress=0/444440243 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-440-Data.db/(0,176842211)
>          progress=0/176842211 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-328-Data.db/(0,14275850800)
>          progress=0/14275850800 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-390-Data.db/(0,445241132)
>          progress=0/445241132 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-397-Data.db/(0,31878407176)
>          progress=0/31878407176 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-385-Data.db/(0,447272883)
>          progress=0/447272883 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-393-Data.db/(0,446800028)
>          progress=0/446800028 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-396-Data.db/(0,449662908)
>          progress=0/449662908 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-381-Data.db/(0,447381444)
>          progress=0/447381444 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-409-Data.db/(0,454101872)
>          progress=0/454101872 - 0%
> Streaming to: /X.22
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-350-Data.db/(0,887780227)
>          progress=0/887780227 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-341-Data.db/(0,885896138)
>          progress=0/885896138 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-368-Data.db/(0,892560053)
>          progress=0/892560053 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-358-Data.db/(0,888436251)
>          progress=0/888436251 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-367-Data.db/(0,893446845)
>          progress=0/893446845 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-354-Data.db/(0,889058842)
>          progress=0/889058842 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-90-Data.db/(0,61505031301)
>          progress=0/61505031301 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-331-Data.db/(0,887620464)
>          progress=0/887620464 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-361-Data.db/(0,890820399)
>          progress=0/890820399 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-345-Data.db/(0,887535512)
>          progress=0/887535512 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-329-Data.db/(0,16876107370)
>          progress=0/16876107370 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-364-Data.db/(0,893839028)
>          progress=0/893839028 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-356-Data.db/(0,891862436)
>          progress=0/891862436 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-332-Data.db/(0,886276363)
>          progress=0/886276363 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-292-Data.db/(0,388239771)
>          progress=0/388239771 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-379-Data.db/(0,907731463)
>          progress=0/907731463 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-348-Data.db/(0,893114355)
>          progress=0/893114355 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-371-Data.db/(0,888682755)
>          progress=0/888682755 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-338-Data.db/(0,885144435)
>          progress=0/885144435 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-340-Data.db/(0,890937418)
>          progress=0/890937418 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-233-Data.db/(0,33902556016)
>          progress=0/33902556016 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-376-Data.db/(0,897426603)
>          progress=0/897426603 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-366-Data.db/(0,888711957)
>          progress=0/888711957 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-327-Data.db/(0,208633237)
>          progress=0/208633237 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-369-Data.db/(0,893954909)
>          progress=0/893954909 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-377-Data.db/(0,897265056)
>          progress=0/897265056 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-339-Data.db/(0,888998653)
>          progress=0/888998653 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-378-Data.db/(0,901053427)
>          progress=0/901053427 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-271-Data.db/(0,4497953871)
>          progress=0/4497953871 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-343-Data.db/(0,891732427)
>          progress=0/891732427 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-357-Data.db/(0,888267065)
>          progress=0/888267065 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-335-Data.db/(0,889998928)
>          progress=0/889998928 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-365-Data.db/(0,888528931)
>          progress=0/888528931 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-328-Data.db/(0,14275850800)
>          progress=0/14275850800 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-355-Data.db/(0,893535664)
>          progress=0/893535664 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-347-Data.db/(0,891375566)
>          progress=0/891375566 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-375-Data.db/(0,897994571)
>          progress=0/897994571 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-373-Data.db/(0,897589898)
>          progress=0/897589898 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-336-Data.db/(0,891079134)
>          progress=0/891079134 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-372-Data.db/(0,892852094)
>          progress=0/892852094 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-337-Data.db/(0,885983148)
>          progress=0/885983148 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-346-Data.db/(0,886424157)
>          progress=0/886424157 - 0%
>    /var/lib/cassandra/data/Attrs/TestAttrs-e-353-Data.db/(0,889222127)
>          progress=0/889222127 - 0%
> Streaming from: /X.20
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-385-Data.db/(0,357220526)
>          progress=0/357220526 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-389-Data.db/(0,371718620)
>          progress=0/371718620 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-369-Data.db/(0,888814566)
>          progress=0/888814566 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-374-Data.db/(0,889095219)
>          progress=0/889095219 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-330-Data.db/(0,307921317)
>          progress=0/307921317 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-331-Data.db/(0,13323957368)
>          progress=0/13323957368 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-383-Data.db/(0,28216239303)
>          progress=0/28216239303 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-319-Data.db/(0,14172830870)
>          progress=0/14172830870 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-366-Data.db/(0,899103394)
>          progress=0/899103394 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-367-Data.db/(0,894019840)
>          progress=0/894019840 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-390-Data.db/(0,185277927)
>          progress=0/185277927 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-378-Data.db/(0,896956312)
>          progress=0/896956312 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-372-Data.db/(0,893683568)
>          progress=0/893683568 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-386-Data.db/(0,182528797)
>          progress=0/182528797 - 0%
>    Attrs: /var/lib/cassandra/data/Attrs/TestAttrs-e-333-Data.db/(0,22306924)
>          progress=0/22306924 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-376-Data.db/(0,897417147)
>          progress=0/897417147 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-283-Data.db/(0,8939407316)
>          progress=0/8939407316 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-381-Data.db/(0,908075169)
>          progress=0/908075169 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-373-Data.db/(0,892116147)
>          progress=0/892116147 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-377-Data.db/(0,898165901)
>          progress=0/898165901 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-332-Data.db/(0,219682197079)
>          progress=0/219682197079 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-382-Data.db/(0,784362565)
>          progress=0/784362565 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-375-Data.db/(0,893034298)
>          progress=0/893034298 - 0%
>    Attrs:
> /var/lib/cassandra/data/Attrs/TestAttrs-e-380-Data.db/(0,901377643)
>          progress=0/901377643 - 0%
>  Nothing streaming from /10.5.5.22
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0      433633667
> Responses                       n/a         0      402612386
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message