It sounds like either there is a fairly obvious bug, or you're doing
something wrong. :)
Can you reproduce against a single node?
On Tue, Apr 27, 2010 at 5:14 PM, Joost Ouwerkerk <joost@openplaces.org> wrote:
> Update: I ran a test whereby I deleted ALL the rows in a column
> family, using a consistency level of ALL. To do this, I mapped the
> ColumnFamily and called remove on each row id. There were 1.5 million
> rows, so 1.5 million rows were deleted.
>
> I ran a counter job immediately after. This job maps the same column
> family and tests if any data is returned. If not, it considers the
> row a "tombstone". If yes, it considers the row not deleted. Below
> are the hadoop counters for those jobs. Note the fluctuation in the
> number of rows with data over time, and the increase in time to map
> the column family after the destroy job. No other clients were
> accessing cassandra during this time.
>
> I'm thoroughly confused.
>
> Count: started 13:02:30 EDT, finished 13:11:33 EDT (9 minutes 2 seconds):
> ROWS: 1,542,479
> TOMBSTONES: 69
>
> Destroy: started 16:48:45 EDT, finished 17:07:36 EDT (18 minutes 50 seconds)
> DESTROYED: 1,542,548
>
> Count: started 17:15:42 EDT, finished 17:31:03 EDT (15 minutes 21 seconds)
> ROWS 876,464
> TOMBSTONES 666,084
>
> Count: started 17:31:32, finished 17:47:16 (15mins, 44 seconds)
> ROWS 1,451,665
> TOMBSTONES 90,883
>
> Count: started 17:52:34, finished 18:10:28 (17mins, 53 seconds)
> ROWS 1,425,644
> TOMBSTONES 116,904
>
> On Tue, Apr 27, 2010 at 5:37 PM, Joost Ouwerkerk <joost@openplaces.org> wrote:
>> Clocks are in sync:
>>
>> cluster04:~/cassandra$ dsh -g development "date"
>> Tue Apr 27 17:36:33 EDT 2010
>> Tue Apr 27 17:36:33 EDT 2010
>> Tue Apr 27 17:36:33 EDT 2010
>> Tue Apr 27 17:36:33 EDT 2010
>> Tue Apr 27 17:36:34 EDT 2010
>> Tue Apr 27 17:36:34 EDT 2010
>> Tue Apr 27 17:36:34 EDT 2010
>> Tue Apr 27 17:36:34 EDT 2010
>> Tue Apr 27 17:36:34 EDT 2010
>> Tue Apr 27 17:36:35 EDT 2010
>> Tue Apr 27 17:36:35 EDT 2010
>> Tue Apr 27 17:36:35 EDT 2010
>>
>> On Tue, Apr 27, 2010 at 5:35 PM, Nathan McCall <nate@vervewireless.com> wrote:
>>> Have you confirmed that your clocks are all synced in the cluster?
>>> This may be the result of an unintentional read-repair occurring if
>>> that were the case.
>>>
>>> -Nate
>>>
>>> On Tue, Apr 27, 2010 at 2:20 PM, Joost Ouwerkerk <joost@openplaces.org>
wrote:
>>>> Hmm... Even after deleting with cl.ALL, I'm getting data back for some
>>>> rows after having deleted them. Which rows return data is
>>>> inconsistent from one run of the job to the next.
>>>>
>>>> On Tue, Apr 27, 2010 at 1:44 PM, Joost Ouwerkerk <joost@openplaces.org>
wrote:
>>>>> To check that rows are gone, I check that KeySlice.columns is empty.
And as
>>>>> I mentioned, immediately after the delete job, this returns the expected
>>>>> number.
>>>>> Unfortunately I reproduced with QUORUM this morning. No node outages.
I am
>>>>> going to try ALL to see if that changes anything, but I am starting to
>>>>> wonder if I'm doing something else wrong.
>>>>> On Mon, Apr 26, 2010 at 9:45 PM, Jonathan Ellis <jbellis@gmail.com>
wrote:
>>>>>>
>>>>>> How are you checking that the rows are gone?
>>>>>>
>>>>>> Are you experiencing node outages during this?
>>>>>>
>>>>>> DC_QUORUM is unfinished code right now, you should avoid using it.
>>>>>> Can you reproduce with normal QUORUM?
>>>>>>
>>>>>> On Sat, Apr 24, 2010 at 12:23 PM, Joost Ouwerkerk <joost@openplaces.org>
>>>>>> wrote:
>>>>>> > I'm having trouble deleting rows in Cassandra. After running
a job that
>>>>>> > deletes hundreds of rows, I run another job that verifies that
the rows
>>>>>> > are
>>>>>> > gone. Both jobs run correctly. However, when I run the verification
>>>>>> > job an
>>>>>> > hour later, the rows have re-appeared. This is not a case
of "ghosting"
>>>>>> > because the verification job actually checks that there is data
in the
>>>>>> > columns.
>>>>>> >
>>>>>> > I am running a cluster with 12 nodes and a replication factor
of 3. I
>>>>>> > am
>>>>>> > using DC_QUORUM consistency when deleting.
>>>>>> >
>>>>>> > Any ideas?
>>>>>> > Joost.
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jonathan Ellis
>>>>>> Project Chair, Apache Cassandra
>>>>>> co-founder of Riptano, the source for professional Cassandra support
>>>>>> http://riptano.com
>>>>>
>>>>>
>>>>
>>>
>>
>
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com
|