cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanal Vasudevan <get2sa...@gmail.com>
Subject Re: Cassandra Mutation object decoding
Date Thu, 24 Nov 2016 02:05:14 GMT
I must say that it is really encouraging to get your thoughts.
Thanks a ton Benjamin, Jacques-Henri, Jordan Nate and Chris.

I do not have access on the client side where the CQL is executed.
One of my requirements is that my app should not to affect the performance
of the cassandra cluster or have very minimal overhead.
I am given access to the commit logs and CDC logs (under
<Cassandra>//data/cdc_raw). I can access the database to query the metadata.

I understand that using the Mutation object is risky due to changes in
newer releases. Considering no (or minimal) load on C* cluster and
performance of my app, I am leaning more towards Mutation.
CASSANDRA-8844 suggests use of CommitLogReader and implementing a
CommitLogReadHandler interface which pushes the
Mutation object.
Are you guys aware how we could use the CDC feature without decoding
Mutation?
Just want to make sure I am not missing some functionality available in the
CDC feature and I am using the CDC feature in the expected fashion.

Just looking at the state of Mutation object,  this is what I get:
DELETE CQL:
mutation.getPartitionUpdates().partitionUpdate.columns().isEmpty() : true
INSERT/UPDATE CQL:
mutation.getPartitionUpdates().partitionUpdate.columns().isEmpty() : false
I am checking internally with my team whether I can live with INSERT/UPDATE
classified as upsert (as Jacques-Henri did earlier).

I am able to decode partition key, ksName, cfName, ColumnData and Column
definition from the Mutation object.

Thanks folks, great help from this community.

Best regards,
Sanal

On Wed, Nov 23, 2016 at 8:36 PM, Benjamin Lerer <benjamin.lerer@datastax.com
> wrote:

> >
> > My goal is to reconstruct the CQL operation from the Mutation object.
> > So that I can trigger the same action on another NoSQL target like
> MongoDB.
> >
>
> There are different way of keeping your 2 database in sync. Unfortunatly,
> they all have some trade offs (as always ;-))
>
>
>    1. If you have controle on the client side, you could wrap the driver
>    and add some code that convert the query and write it to the other
> database
>    at the same time. The main problem with that approach is that a write
> can
>    succeed on one of the database but not on the other. Which means that
> you
>    will need a mechanism to resolve those problems.
>    2. On the Cassandra side you could, as Nate suggested, extends the
>    QueryProcessor in order to log the mutations to a log file. As the
>    QueryProcessor has access to the prepared statement cache and to the
> bind
>    parameter you should be able to extract the information you need. Some
> of
>    the problems of that approach are:
>       1. You cannot reprocess already inserted data
>       2. You will probably have to use a replication log to deal with the
>       cases where the other database is unreachable
>       3. It might slow down your query processing and take some of your
>       band width at critical time (heavy write)
>       3. Use a fake index as Jacques-Henri suggested. It will allow to
>    easily reprocess already inserted data so you will not need some
>    replication logs (at the same time having to rebuild the index might
> slow
>    down your database). The main issues for that solution are:
>    1. All the tables that you want to replicate will have to have that
>       index and you cannot automatically update the schemas on your
> other database
>       2. It might slow down your query processing and take some of your
>       band width at critical time (heavy write)
>    4. Read the commitlogs to recreate the mutation statements (your initial
>    approach). The main problem is that it is simply not easy to do and
> might
>    break up with new major releases. You will also have to make sure that
> the
>    files do not disappear before you have processed them.
>    5. Try a Datawarehouse/ETL approach to synchronized your data.
>    CASSANDRA-8844 added support for CDC (Change Data Capture) which might
> help
>    you there. Unfortunatly, I have not really worked on it so I cannot help
>    you much there.
>
> There might be some other approach that are worth considering but they did
> not come to my mind.
>
> Hope it helps
>
> Benjamin
>
> PS: MongoDB ... Seriously ??? ;-)
>



-- 
Sanal Vasudevan Nair

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message