incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joost Ouwerkerk <jo...@openplaces.org>
Subject Re: Cassandra reverting deletes?
Date Tue, 27 Apr 2010 22:14:59 GMT
Update: I ran a test whereby I deleted ALL the rows in a column
family, using a consistency level of ALL.  To do this, I mapped the
ColumnFamily and called remove on each row id.  There were 1.5 million
rows, so 1.5 million rows were deleted.

I ran a counter job immediately after.  This job maps the same column
family and tests if any data is returned.  If not, it considers the
row a "tombstone".  If yes, it considers the row not deleted.  Below
are the hadoop counters for those jobs.  Note the fluctuation in the
number of rows with data over time, and the increase in time to map
the column family after the destroy job.  No other clients were
accessing cassandra during this time.

I'm thoroughly confused.

Count: started 13:02:30 EDT, finished 13:11:33 EDT (9 minutes 2 seconds):
   ROWS:	1,542,479
   TOMBSTONES:	69

Destroy: started 16:48:45 EDT, finished 17:07:36 EDT (18 minutes 50 seconds)
   DESTROYED:  1,542,548

Count: started 17:15:42 EDT, finished 17:31:03 EDT (15 minutes 21 seconds)
   ROWS	876,464
   TOMBSTONES	666,084

Count: started 17:31:32, finished 17:47:16 (15mins, 44 seconds)
   ROWS	1,451,665
   TOMBSTONES	90,883	

Count: started 17:52:34, finished 18:10:28 (17mins, 53 seconds)
   ROWS	1,425,644
   TOMBSTONES	116,904

On Tue, Apr 27, 2010 at 5:37 PM, Joost Ouwerkerk <joost@openplaces.org> wrote:
> Clocks are in sync:
>
> cluster04:~/cassandra$ dsh -g development "date"
> Tue Apr 27 17:36:33 EDT 2010
> Tue Apr 27 17:36:33 EDT 2010
> Tue Apr 27 17:36:33 EDT 2010
> Tue Apr 27 17:36:33 EDT 2010
> Tue Apr 27 17:36:34 EDT 2010
> Tue Apr 27 17:36:34 EDT 2010
> Tue Apr 27 17:36:34 EDT 2010
> Tue Apr 27 17:36:34 EDT 2010
> Tue Apr 27 17:36:34 EDT 2010
> Tue Apr 27 17:36:35 EDT 2010
> Tue Apr 27 17:36:35 EDT 2010
> Tue Apr 27 17:36:35 EDT 2010
>
> On Tue, Apr 27, 2010 at 5:35 PM, Nathan McCall <nate@vervewireless.com> wrote:
>> Have you confirmed that your clocks are all synced in the cluster?
>> This may be the result of an unintentional read-repair occurring if
>> that were the case.
>>
>> -Nate
>>
>> On Tue, Apr 27, 2010 at 2:20 PM, Joost Ouwerkerk <joost@openplaces.org> wrote:
>>> Hmm... Even after deleting with cl.ALL, I'm getting data back for some
>>> rows after having deleted them.  Which rows return data is
>>> inconsistent from one run of the job to the next.
>>>
>>> On Tue, Apr 27, 2010 at 1:44 PM, Joost Ouwerkerk <joost@openplaces.org>
wrote:
>>>> To check that rows are gone, I check that KeySlice.columns is empty.  And
as
>>>> I mentioned, immediately after the delete job, this returns the expected
>>>> number.
>>>> Unfortunately I reproduced with QUORUM this morning.  No node outages.  I
am
>>>> going to try ALL to see if that changes anything, but I am starting to
>>>> wonder if I'm doing something else wrong.
>>>> On Mon, Apr 26, 2010 at 9:45 PM, Jonathan Ellis <jbellis@gmail.com>
wrote:
>>>>>
>>>>> How are you checking that the rows are gone?
>>>>>
>>>>> Are you experiencing node outages during this?
>>>>>
>>>>> DC_QUORUM is unfinished code right now, you should avoid using it.
>>>>> Can you reproduce with normal QUORUM?
>>>>>
>>>>> On Sat, Apr 24, 2010 at 12:23 PM, Joost Ouwerkerk <joost@openplaces.org>
>>>>> wrote:
>>>>> > I'm having trouble deleting rows in Cassandra.  After running a
job that
>>>>> > deletes hundreds of rows, I run another job that verifies that the
rows
>>>>> > are
>>>>> > gone.  Both jobs run correctly.  However, when I run the verification
>>>>> > job an
>>>>> > hour later, the rows have re-appeared.  This is not a case of "ghosting"
>>>>> > because the verification job actually checks that there is data
in the
>>>>> > columns.
>>>>> >
>>>>> > I am running a cluster with 12 nodes and a replication factor of
3.  I
>>>>> > am
>>>>> > using DC_QUORUM consistency when deleting.
>>>>> >
>>>>> > Any ideas?
>>>>> > Joost.
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jonathan Ellis
>>>>> Project Chair, Apache Cassandra
>>>>> co-founder of Riptano, the source for professional Cassandra support
>>>>> http://riptano.com
>>>>
>>>>
>>>
>>
>

Mime
View raw message