cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Niehoff <matthias.nieh...@codecentric.de>
Subject Re: Internal Handling of Map Updates
Date Thu, 02 Jun 2016 06:52:09 GMT
JSON would be an option, yes. A frozen collection would not work for us, as
the updates are both overwrites of existing values and appends of new
values (but never a remove of values).
So we end up with 3 options:

1. use clustering columns
2. use json
3. save the row not using the spark-cassandra-connectors saveToCassandra()
method (which does an insert of the whole row and map), but writing an own
save method using update on the map (as Eric proposed).

I think we will go for option 1 or 2 as those are the least costly
solutions.

Nevertheless, its a pity that an insert on a row with a map will always
create tombstones :-(



2016-06-02 2:02 GMT+02:00 Eric Stevens <mightye@gmail.com>:

> From that perspective, you could also use a frozen collection which takes
> away the ability to append, but for which overwrites shouldn't generate a
> tombstone.
>
> On Wed, Jun 1, 2016, 5:54 PM kurt Greaves <kurt@instaclustr.com> wrote:
>
>> Is there anything stopping you from using JSON instead of a collection?
>>
>> On 27 May 2016 at 15:20, Eric Stevens <mightye@gmail.com> wrote:
>>
>>> If you aren't removing elements from the map, you should instead be able
>>> to use an UPDATE statement and append the map. It will have the same effect
>>> as overwriting it, because all the new keys will take precedence over the
>>> existing keys. But it'll happen without generating a tombstone first.
>>>
>>> If you do have to remove elements from the collection during this
>>> process, you are either facing tombstones or having to surgically figure
>>> out which elements ought to be removed (which also involves tombstones,
>>> though at least not range tombstones, so a bit cheaper).
>>>
>>> On Fri, May 27, 2016, 5:39 AM Matthias Niehoff <
>>> matthias.niehoff@codecentric.de> wrote:
>>>
>>>> We are processing events in Spark and store the resulting entries
>>>> (containing a map) in Cassandra. The results can be new (no entry for this
>>>> key in Cassandra) or an Update (there is already an entry with this key in
>>>> Cassandra). We use the spark-cassandra-connector to store the data in
>>>> Cassandra.
>>>>
>>>> The connector will always do an insert of the data and will rely on the
>>>> upsert capabilities of cassandra. So every time an event is updated the
>>>> complete map is replaced with all the problems of tombstones.
>>>> Seems like we have to implement our own persist logic in which we check
>>>> if an element already exists and if yes update the map manually. that would
>>>> require a read before write which would be nasty. Another option would be
>>>> not to use a collection but (clustering) columns. Do you have another idea
>>>> of doing this?
>>>>
>>>> (the conclusion of this whole thing for me would be: use upsert, but do
>>>> specific updates on collections as an upsert might replace the whole
>>>> collection and generate thumbstones)
>>>>
>>>> 2016-05-25 17:37 GMT+02:00 Tyler Hobbs <tyler@datastax.com>:
>>>>
>>>>> If you replace an entire collection, whether it's a map, set, or list,
>>>>> a range tombstone will be inserted followed by the new collection.  If
you
>>>>> only update a single element, no tombstones are generated.
>>>>>
>>>>> On Wed, May 25, 2016 at 9:48 AM, Matthias Niehoff <
>>>>> matthias.niehoff@codecentric.de> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> we have a table with a Map Field. We do not delete anything in this
>>>>>> table, but to updates on the values including the Map Field (most
of the
>>>>>> time a new value for an existing key, Rarely adding new keys). We
now
>>>>>> encounter a huge amount of thumbstones for this Table.
>>>>>>
>>>>>> We used sstable2json to take a look into the sstables:
>>>>>>
>>>>>>
>>>>>> {"key": "Betty_StoreCatalogLines:7",
>>>>>>
>>>>>>  "cells": [["276-1-6MPQ0RI-276110031802001001:","",1463820040628001],
>>>>>>
>>>>>>            ["276-1-6MPQ0RI-276110031802001001:last_modified","2016-05-21
08:40Z",1463820040628001],
>>>>>>
>>>>>>            ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463040069753999,"t",1463040069],
>>>>>>
>>>>>>            ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463120708590002,"t",1463120708],
>>>>>>
>>>>>>            ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463145700735007,"t",1463145700],
>>>>>>
>>>>>>            ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463157430862000,"t",1463157430],
>>>>>>
>>>>>>            [„276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_“,“276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!“,1463164595291002,"t",1463164595],
>>>>>>
>>>>>> . . .
>>>>>>
>>>>>>   ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:_","276-1-6MPQ0RI-276110031802001001:last_modified_by_source:!",1463820040628000,"t",1463820040],
>>>>>>
>>>>>>            ["276-1-6MPQ0RI-276110031802001001:last_modified_by_source:62657474795f73746f72655f636174616c6f675f6c696e6573","00000154d265c6b0",1463820040628001],
>>>>>>
>>>>>>            [„276-1-6MPQ0RI-276110031802001001:payload“,"{\"payload\":{\"Article
Id\":\"276110031802001001\",\"Row Id\":\"1-6MPQ0RI\",\"Article #\":\"31802001001\",\"Quote
Item Id\":\"1-6MPWPVC\",\"Country Code\":\"276\"}}",1463820040628001]
>>>>>>
>>>>>>
>>>>>>
>>>>>> Looking at the SStables it seem like every update of a value in a
Map
>>>>>> breaks down to a delete and insert in the corresponding SSTable (see
all
>>>>>> the thumbstone flags „t“ in the extract of sstable2json above).
>>>>>>
>>>>>> We are using Cassandra 2.2.5.
>>>>>>
>>>>>> Can you confirm this behavior?
>>>>>>
>>>>>> Thanks!
>>>>>> --
>>>>>> Matthias Niehoff | IT-Consultant | Agile Software Factory  |
>>>>>> Consulting
>>>>>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
>>>>>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49
>>>>>> (0) 172.1702676
>>>>>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de
|
>>>>>> www.more4fi.de
>>>>>>
>>>>>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
>>>>>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
>>>>>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen
>>>>>> Schütz
>>>>>>
>>>>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält
>>>>>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie
nicht
>>>>>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten
haben,
>>>>>> informieren Sie bitte sofort den Absender und löschen Sie diese
E-Mail und
>>>>>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen
oder
>>>>>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe
dieser
>>>>>> E-Mail ist nicht gestattet
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Tyler Hobbs
>>>>> DataStax <http://datastax.com/>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
>>>> codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
>>>> tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
>>>> 172.1702676
>>>> www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
>>>> www.more4fi.de
>>>>
>>>> Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
>>>> Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
>>>> Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen
>>>> Schütz
>>>>
>>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält
>>>> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht
>>>> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
>>>> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und
>>>> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder
>>>> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser
>>>> E-Mail ist nicht gestattet
>>>>
>>>
>>
>>
>> --
>> Kurt Greaves
>> kurt@instaclustr.com
>> www.instaclustr.com
>>
>


-- 
Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
172.1702676
www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
www.more4fi.de

Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen Schütz

Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche
und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige
Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie
bitte sofort den Absender und löschen Sie diese E-Mail und evtl.
beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen
evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist
nicht gestattet

Mime
View raw message