cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8623) sstablesplit fails *randomly* with Data component is missing
Date Thu, 22 Jan 2015 16:00:36 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287640#comment-14287640
] 

Benedict commented on CASSANDRA-8623:
-------------------------------------

I've had a bit of a look into the code (I'm not familiar with the book keeping), and I think
the interesting question here is simply why the data component is missing, and what other
components aren't.

I suspect, without digging deeply, that normal compactions aren't disabled (can't see the
disable compactions call), and so compactions_in_progress is itself being compacted. The system.exit(0)
call probably interleaves with one of the SSTableDeletingTask calls (we may wait for prior
deleting tasks to complete, but this doesn't prevent another to be queued straight after us
from a parallel compaction).

At  the same time, it looks to me like our whole approach to compactions and leftovers needs
to be revisited, and should probably come under the umbrella of CASSANDRA-8568. For instance:
do we really need to cleanup compaction leftovers if we leave them all as tmp files until
we're done? Since they'll be cleaned up on restart either way. It may be the whole system
table is unnecessary (except for a slight lack of atomicity at replacement time, but this
could be achieved more simply).


> sstablesplit fails *randomly* with Data component is missing
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-8623
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8623
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Alan Boudreault
>            Assignee: Marcus Eriksson
>         Attachments: output.log
>
>
> I'm experiencing an issue related to sstablesplit. I would like to understand if I am
doing something wrong or there is an issue in the split process. The process fails randomly
with the following exception:
> {code}
> ERROR 02:17:36 Error in ThreadPoolExecutor
> java.lang.AssertionError: Data component is missing for sstable./tools/bin/../../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-16
> {code}
> See attached output.log file. The process never stops after this exception and I've also
seen the dataset growing indefinitely (number of sstables).  
> * I have not been able to reproduce the issue with a single sstablesplit command. ie,
specifying all files with glob matching.
> * I can reproduce the bug if I call multiple sstablesplit one file at the time (the way
ccm does)
> Here is the test case file to reproduce the bug:
> https://drive.google.com/file/d/0BwZ_GPM33j6KdVh0NTdkOWV2R1E/view?usp=sharing
> 1. Download the split_issue.tar.gz file. It includes latest cassandra-2.1 branch binaries.
> 2. Extract it
> 3. CD inside the use case directory
> 4. Download the dataset (2G) just to be sure we have the same thing, and place it in
the working directory.
>    https://docs.google.com/uc?id=0BwZ_GPM33j6KV3ViNnpPcVFndUU&export=download
> 5. The first time, run ./test.sh. This will setup and run a test.
> 6. The next times, you can only run ./test --no-setup . This will only reset the dataset
as its initial state and re-run the test. You might have to run the tests some times before
experiencing it... but I'm always able with only 2-3 runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message