apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ananth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXMALHAR-2181) Non-Transactional Prepared Statement Based Cassandra Upsert (Update + Insert ) output Operator
Date Sun, 23 Oct 2016 06:45:58 GMT

    [ https://issues.apache.org/jira/browse/APEXMALHAR-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15599194#comment-15599194
] 

Ananth commented on APEXMALHAR-2181:
------------------------------------

Functionality ready for review at https://github.com/apache/apex-malhar/pull/466 

> Non-Transactional Prepared Statement Based Cassandra Upsert (Update + Insert ) output
Operator
> ----------------------------------------------------------------------------------------------
>
>                 Key: APEXMALHAR-2181
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2181
>             Project: Apache Apex Malhar
>          Issue Type: New Feature
>            Reporter: Ananth
>            Assignee: Ananth
>
>  An abstract operator that is used to mutate cassandra rows using PreparedStatements
for faster executions
>   and accommodates EXACTLY_ONCE Semantics if concrete implementations choose to implement
an abstract method with
>   meaningful implementation (as Cassandra is not a pure transactional database , the
burden is on the concrete
>   implementation of the operator ONLY during the reconciliation window (and not for any
other windows).
>  ===========================================================
>   The typical execution flow is as follows :
>    1. Create a concrete implementation of this class by extending this class and implement
a few methods.
>    2. Define the payload that is the POJO that represents a Cassandra Row is part of
this execution context
>       {@link UpsertExecutionContext}. The payload is a template Parameter of this class
>    3. The Upstream operator that wants to write to Cassandra does the following
>        a. Create an instance of {@link UpsertExecutionContext}
>        b. Set the payload ( an instance of the POJO created as step two above )
>        c. Set additional execution context parameters like CollectionHandling style,
List placement Styles
>           overriding TTLs, Update only if Primary keys exist and Consistency Levels etc.
>    4. The concrete implementation would then execute this context as a cassandra row
mutation
>  ===========================================================
>   This operator supports the following features
>   1. Highly customizable Connection policies. This is achieved by specifying the ConnectionStateManager.
>      There are a good number of connection management aspects that can be
>      controlled via {@link ConnectionStateManager} like consistency, load balancing,
connection retries,
>      table to use, keyspace to use etc. Please refer javadoc of {@link ConnectionStateManager}
>   2. Support for Collections : Map, List and Sets are supported
>      User Defined types as part of collections is also supported.
>   3. Support exists for both adding to an existing collection or removing entries from
an existing collection.
>      The POJO field that represents a collection is used to represent the collection
that is added or removed.
>      Thus this can be used to avoid a pattern of read and then write the final value
into the cassandra column
>      which can be used for low latency / high write pattern applications as we can avoid
a read in the process.
>   4. Supports List Placements : The execution context can be used to specify where the
new incoming list
>      is to be added ( in case there is an existing list in the current column of the
current row being mutated.
>      Supported options are APPEND or PREPEND to an existing list
>   5. Support for User Defined Types. A pojo can have fields that represent the Cassandra
Columns that are custom
>      user defined types. Concrete implementations of the operator provide a mapping of
the cassandra column name
>      to the TypeCodec that is to be used for that field inside cassandra. Please refer
javadocs of
>      {@link this.getCodecsForUserDefinedTypes() } for more details
>   6. Support for custom mapping of POJO payload field names to that of cassandra columns.
Practically speaking,
>      POJO field names might not always match with Cassandra Column names and hence this
support. This will also avoid
>      writing a POJO just for the cassandra operator and thus an existing POJO can be
passed around to this operator.
>      Please refer javadoc {@link this.getPojoFieldNameToCassandraColumnNameOverride()}
for an example
>   7. TTL support - A default TTL can be set for the Connection ( via {@link ConnectionStateManager}
and then used
>      for all mutations. This TTL can further be overridden at a tuple execution level
to accomodate use cases of
>      setting custom column expirations typically useful in wide row implementations.
>   8. Support for Counter Column tables. Counter tables are also supported with the values
inside the incoming
>      POJO added/subtracted from the counter column accordingly. Please note that the
value is not absolute set but
>      rather representing the value that needs to be added to or subtracted from the current
counter.
>   9. Support for Composite Primary Keys is also supported. All the POJO fields that map
to the composite
>      primary key are used to resolve the primary key in case of a Composite Primary key
table
>   10. Support for conditional updates : This operator can be used as an Update Only operator
as opposed to an
>       Upsert operator. i.e. Update only IF EXISTS . This is achieved by setting the appropriate
boolean in the
>       {@link UpsertExecutionContext} tuple that is passed from the upstream operator.
>   11. Lenient mapping of POJO fields to Cassandra column names. By default the POJO field
names are case insensitive
>       to cassandra column names. This can be further enhanced by over-riding mappings.
Please refer feature 6 above.
>   12. Defaults can be overridden at at tuple execution level for TTL & Consistency
Policies
>   13. Support for handling Nulls i.e. whether null values in the POJO are to be persisted
as is or to be ignored so
>       that the application need not perform a read to populate a POJO field if it is
not available in the context
>   14. A few autometrics are provided for monitoring the latency aspects of the cassandra
cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message