incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Cassandra reverting deletes?
Date Fri, 30 Apr 2010 13:33:38 GMT
https://issues.apache.org/jira/browse/CASSANDRA-1040

On Thu, Apr 29, 2010 at 6:55 PM, Joost Ouwerkerk <joost@openplaces.org> wrote:
> Ok, I reproduced without mapred.  Here is my recipe:
>
> On a single-node cassandra cluster with basic config (-Xmx:1G)
> loop {
>   * insert 5,000 records in a single columnfamily with UUID keys and
> random string values (between 1 and 1000 chars) in 5 different columns
> spanning two different supercolumns
>   * delete all the data by iterating over the rows with
> get_range_slices(ONE) and calling remove(QUORUM) on each row id
> returned (path containing only columnfamily)
>   * count number of non-tombstone rows by iterating over the rows
> with get_range_slices(ONE) and testing data.  Break if not zero.
> }
>
> Here's the flakey part:  while this is running, call "bin/nodetool -h
> localhost -p 8081 flush KeySpace" in the background every minute or
> so.  When the data hits some critical size, the loop will break.
> Anyone care to try this at home?
>
> On Thu, Apr 29, 2010 at 12:51 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
>> Good! :)
>>
>> Can you reproduce w/o map/reduce, with raw get_range_slices?
>>
>> On Wed, Apr 28, 2010 at 3:56 PM, Joost Ouwerkerk <joost@openplaces.org> wrote:
>>> Yes! Reproduced on single-node cluster:
>>>
>>> 10/04/28 16:30:24 INFO mapred.JobClient:     ROWS=274884
>>> 10/04/28 16:30:24 INFO mapred.JobClient:     TOMBSTONES=951083
>>>
>>> 10/04/28 16:42:49 INFO mapred.JobClient:     ROWS=166580
>>> 10/04/28 16:42:49 INFO mapred.JobClient:     TOMBSTONES=1059387
>>>
>>> On Wed, Apr 28, 2010 at 10:43 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>>> It sounds like either there is a fairly obvious bug, or you're doing
>>>> something wrong. :)
>>>>
>>>> Can you reproduce against a single node?
>>>>
>>>> On Tue, Apr 27, 2010 at 5:14 PM, Joost Ouwerkerk <joost@openplaces.org>
wrote:
>>>>> Update: I ran a test whereby I deleted ALL the rows in a column
>>>>> family, using a consistency level of ALL.  To do this, I mapped the
>>>>> ColumnFamily and called remove on each row id.  There were 1.5 million
>>>>> rows, so 1.5 million rows were deleted.
>>>>>
>>>>> I ran a counter job immediately after.  This job maps the same column
>>>>> family and tests if any data is returned.  If not, it considers the
>>>>> row a "tombstone".  If yes, it considers the row not deleted.  Below
>>>>> are the hadoop counters for those jobs.  Note the fluctuation in the
>>>>> number of rows with data over time, and the increase in time to map
>>>>> the column family after the destroy job.  No other clients were
>>>>> accessing cassandra during this time.
>>>>>
>>>>> I'm thoroughly confused.
>>>>>
>>>>> Count: started 13:02:30 EDT, finished 13:11:33 EDT (9 minutes 2 seconds):
>>>>>   ROWS:        1,542,479
>>>>>   TOMBSTONES:  69
>>>>>
>>>>> Destroy: started 16:48:45 EDT, finished 17:07:36 EDT (18 minutes 50 seconds)
>>>>>   DESTROYED:  1,542,548
>>>>>
>>>>> Count: started 17:15:42 EDT, finished 17:31:03 EDT (15 minutes 21 seconds)
>>>>>   ROWS 876,464
>>>>>   TOMBSTONES   666,084
>>>>>
>>>>> Count: started 17:31:32, finished 17:47:16 (15mins, 44 seconds)
>>>>>   ROWS 1,451,665
>>>>>   TOMBSTONES   90,883
>>>>>
>>>>> Count: started 17:52:34, finished 18:10:28 (17mins, 53 seconds)
>>>>>   ROWS 1,425,644
>>>>>   TOMBSTONES   116,904
>>>>>
>>>>> On Tue, Apr 27, 2010 at 5:37 PM, Joost Ouwerkerk <joost@openplaces.org>
wrote:
>>>>>> Clocks are in sync:
>>>>>>
>>>>>> cluster04:~/cassandra$ dsh -g development "date"
>>>>>> Tue Apr 27 17:36:33 EDT 2010
>>>>>> Tue Apr 27 17:36:33 EDT 2010
>>>>>> Tue Apr 27 17:36:33 EDT 2010
>>>>>> Tue Apr 27 17:36:33 EDT 2010
>>>>>> Tue Apr 27 17:36:34 EDT 2010
>>>>>> Tue Apr 27 17:36:34 EDT 2010
>>>>>> Tue Apr 27 17:36:34 EDT 2010
>>>>>> Tue Apr 27 17:36:34 EDT 2010
>>>>>> Tue Apr 27 17:36:34 EDT 2010
>>>>>> Tue Apr 27 17:36:35 EDT 2010
>>>>>> Tue Apr 27 17:36:35 EDT 2010
>>>>>> Tue Apr 27 17:36:35 EDT 2010
>>>>>>
>>>>>> On Tue, Apr 27, 2010 at 5:35 PM, Nathan McCall <nate@vervewireless.com>
wrote:
>>>>>>> Have you confirmed that your clocks are all synced in the cluster?
>>>>>>> This may be the result of an unintentional read-repair occurring
if
>>>>>>> that were the case.
>>>>>>>
>>>>>>> -Nate
>>>>>>>
>>>>>>> On Tue, Apr 27, 2010 at 2:20 PM, Joost Ouwerkerk <joost@openplaces.org>
wrote:
>>>>>>>> Hmm... Even after deleting with cl.ALL, I'm getting data
back for some
>>>>>>>> rows after having deleted them.  Which rows return data
is
>>>>>>>> inconsistent from one run of the job to the next.
>>>>>>>>
>>>>>>>> On Tue, Apr 27, 2010 at 1:44 PM, Joost Ouwerkerk <joost@openplaces.org>
wrote:
>>>>>>>>> To check that rows are gone, I check that KeySlice.columns
is empty.  And as
>>>>>>>>> I mentioned, immediately after the delete job, this returns
the expected
>>>>>>>>> number.
>>>>>>>>> Unfortunately I reproduced with QUORUM this morning.
 No node outages.  I am
>>>>>>>>> going to try ALL to see if that changes anything, but
I am starting to
>>>>>>>>> wonder if I'm doing something else wrong.
>>>>>>>>> On Mon, Apr 26, 2010 at 9:45 PM, Jonathan Ellis <jbellis@gmail.com>
wrote:
>>>>>>>>>>
>>>>>>>>>> How are you checking that the rows are gone?
>>>>>>>>>>
>>>>>>>>>> Are you experiencing node outages during this?
>>>>>>>>>>
>>>>>>>>>> DC_QUORUM is unfinished code right now, you should
avoid using it.
>>>>>>>>>> Can you reproduce with normal QUORUM?
>>>>>>>>>>
>>>>>>>>>> On Sat, Apr 24, 2010 at 12:23 PM, Joost Ouwerkerk
<joost@openplaces.org>
>>>>>>>>>> wrote:
>>>>>>>>>> > I'm having trouble deleting rows in Cassandra. 
After running a job that
>>>>>>>>>> > deletes hundreds of rows, I run another job
that verifies that the rows
>>>>>>>>>> > are
>>>>>>>>>> > gone.  Both jobs run correctly.  However,
when I run the verification
>>>>>>>>>> > job an
>>>>>>>>>> > hour later, the rows have re-appeared.  This
is not a case of "ghosting"
>>>>>>>>>> > because the verification job actually checks
that there is data in the
>>>>>>>>>> > columns.
>>>>>>>>>> >
>>>>>>>>>> > I am running a cluster with 12 nodes and a replication
factor of 3.  I
>>>>>>>>>> > am
>>>>>>>>>> > using DC_QUORUM consistency when deleting.
>>>>>>>>>> >
>>>>>>>>>> > Any ideas?
>>>>>>>>>> > Joost.
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Jonathan Ellis
>>>>>>>>>> Project Chair, Apache Cassandra
>>>>>>>>>> co-founder of Riptano, the source for professional
Cassandra support
>>>>>>>>>> http://riptano.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jonathan Ellis
>>>> Project Chair, Apache Cassandra
>>>> co-founder of Riptano, the source for professional Cassandra support
>>>> http://riptano.com
>>>>
>>>
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message