apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXMALHAR-2181) Non-Transactional Prepared Statement Based Cassandra Upsert (Update + Insert ) output Operator
Date Sun, 23 Oct 2016 06:44:58 GMT

    [ https://issues.apache.org/jira/browse/APEXMALHAR-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15599192#comment-15599192

ASF GitHub Bot commented on APEXMALHAR-2181:

GitHub user ananthc opened a pull request:


    APEXMALHAR-2181 Added Cassandra Upsert Operator with PreparedStatemen…

    @PramodSSImmaneni / @ashwinchandrap / @sanjaypujare / @DT-Priyanka : Please review 
    @tweise  / @PramodSSImmaneni  - This pull request bumps the guava libraries to higher
versions ( from 14.x to 16.x ).This was required to support new functionality in the Cassandra
upsert operator (because cassandra driver needed an update ). 
    Brief description of the new functionalities being supported by this operator are here
: https://issues.apache.org/jira/browse/APEXMALHAR-2181 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ananthc/apex-malhar APEXMALHAR-2181.CassandraUpsertOperator

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #466
commit 60eea9d7f5606a71f07114b323aa2dfbd89ffd22
Author: ananthc <ananthg.apex@gmail.com>
Date:   2016-10-23T06:35:15Z

    APEXMALHAR-2181 Added Cassandra Upsert Operator with PreparedStatements and EXACTLY_ONCE
semantics support


> Non-Transactional Prepared Statement Based Cassandra Upsert (Update + Insert ) output
> ----------------------------------------------------------------------------------------------
>                 Key: APEXMALHAR-2181
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2181
>             Project: Apache Apex Malhar
>          Issue Type: New Feature
>            Reporter: Ananth
>            Assignee: Ananth
>  An abstract operator that is used to mutate cassandra rows using PreparedStatements
for faster executions
>   and accommodates EXACTLY_ONCE Semantics if concrete implementations choose to implement
an abstract method with
>   meaningful implementation (as Cassandra is not a pure transactional database , the
burden is on the concrete
>   implementation of the operator ONLY during the reconciliation window (and not for any
other windows).
>  ===========================================================
>   The typical execution flow is as follows :
>    1. Create a concrete implementation of this class by extending this class and implement
a few methods.
>    2. Define the payload that is the POJO that represents a Cassandra Row is part of
this execution context
>       {@link UpsertExecutionContext}. The payload is a template Parameter of this class
>    3. The Upstream operator that wants to write to Cassandra does the following
>        a. Create an instance of {@link UpsertExecutionContext}
>        b. Set the payload ( an instance of the POJO created as step two above )
>        c. Set additional execution context parameters like CollectionHandling style,
List placement Styles
>           overriding TTLs, Update only if Primary keys exist and Consistency Levels etc.
>    4. The concrete implementation would then execute this context as a cassandra row
>  ===========================================================
>   This operator supports the following features
>   1. Highly customizable Connection policies. This is achieved by specifying the ConnectionStateManager.
>      There are a good number of connection management aspects that can be
>      controlled via {@link ConnectionStateManager} like consistency, load balancing,
connection retries,
>      table to use, keyspace to use etc. Please refer javadoc of {@link ConnectionStateManager}
>   2. Support for Collections : Map, List and Sets are supported
>      User Defined types as part of collections is also supported.
>   3. Support exists for both adding to an existing collection or removing entries from
an existing collection.
>      The POJO field that represents a collection is used to represent the collection
that is added or removed.
>      Thus this can be used to avoid a pattern of read and then write the final value
into the cassandra column
>      which can be used for low latency / high write pattern applications as we can avoid
a read in the process.
>   4. Supports List Placements : The execution context can be used to specify where the
new incoming list
>      is to be added ( in case there is an existing list in the current column of the
current row being mutated.
>      Supported options are APPEND or PREPEND to an existing list
>   5. Support for User Defined Types. A pojo can have fields that represent the Cassandra
Columns that are custom
>      user defined types. Concrete implementations of the operator provide a mapping of
the cassandra column name
>      to the TypeCodec that is to be used for that field inside cassandra. Please refer
javadocs of
>      {@link this.getCodecsForUserDefinedTypes() } for more details
>   6. Support for custom mapping of POJO payload field names to that of cassandra columns.
Practically speaking,
>      POJO field names might not always match with Cassandra Column names and hence this
support. This will also avoid
>      writing a POJO just for the cassandra operator and thus an existing POJO can be
passed around to this operator.
>      Please refer javadoc {@link this.getPojoFieldNameToCassandraColumnNameOverride()}
for an example
>   7. TTL support - A default TTL can be set for the Connection ( via {@link ConnectionStateManager}
and then used
>      for all mutations. This TTL can further be overridden at a tuple execution level
to accomodate use cases of
>      setting custom column expirations typically useful in wide row implementations.
>   8. Support for Counter Column tables. Counter tables are also supported with the values
inside the incoming
>      POJO added/subtracted from the counter column accordingly. Please note that the
value is not absolute set but
>      rather representing the value that needs to be added to or subtracted from the current
>   9. Support for Composite Primary Keys is also supported. All the POJO fields that map
to the composite
>      primary key are used to resolve the primary key in case of a Composite Primary key
>   10. Support for conditional updates : This operator can be used as an Update Only operator
as opposed to an
>       Upsert operator. i.e. Update only IF EXISTS . This is achieved by setting the appropriate
boolean in the
>       {@link UpsertExecutionContext} tuple that is passed from the upstream operator.
>   11. Lenient mapping of POJO fields to Cassandra column names. By default the POJO field
names are case insensitive
>       to cassandra column names. This can be further enhanced by over-riding mappings.
Please refer feature 6 above.
>   12. Defaults can be overridden at at tuple execution level for TTL & Consistency
>   13. Support for handling Nulls i.e. whether null values in the POJO are to be persisted
as is or to be ignored so
>       that the application need not perform a read to populate a POJO field if it is
not available in the context
>   14. A few autometrics are provided for monitoring the latency aspects of the cassandra

This message was sent by Atlassian JIRA

View raw message