cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CASSANDRA-8098) Allow CqlInputFormat to be restricted to more than one data-center
Date Wed, 02 Mar 2016 18:18:18 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-8098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aleksey Yeschenko resolved CASSANDRA-8098.
------------------------------------------
    Resolution: Won't Fix

Hadoop input/output formats are likely to move off-tree soon, and as such we aren't going
to allocate any resources to new Hadoop-related functionality.

If you come up with a 3.x patch, however, feel free to reopen the ticket.

> Allow CqlInputFormat to be restricted to more than one data-center
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-8098
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8098
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: mck
>            Assignee: mck
>
> Today, using CqlInputFormat, it's only possible to 
>  - enforce data-locality to one specific data-center, or
>  - disable it by changing CL from LOCAL_ONE to ONE.
> We need a way to enforce data-locality to specific *data-centers*, and would like to
contribute a solution.
> Suggested ideas
>  - CqlInputFormat (gently) calls describeLocalRing against all the listed connection
addresses and merge the results into one masterRangeNodes list, or 
>  - changing the signature of describeLocalRing(..) to describeRings(String keyspace,
String[] dc) and having the job specify which DCs it will be running within.
> *Long description*
> A lot has changed since CASSANDRA-2388 that has made life a lot easier with integrating
c* and hadoop, for example: CqlInputFormat, CL.LOCAL_ONE, LimitedLocalNodeFirstLocalBalancingPolicy,
vnodes, and describe_local_ring.
> When using CqlInputFormat, if you don't want to be stuck within datacenter-locality you
can for example change the consistency level from LOCAL_ONE to ONE. That's great, but describe_local_ring
+ CL.LOCAL_ONE in its current implementation isn't enough for us. We have multiple datacenters
for offline, multiple for online, because we still want the availability advantages that come
from aligning virtual datacenters to physical datacenters for the offline stuff too. That
is using hadoop for aggregation purposes on top of c* doesn't always imply one can settle
with an CP solution.
> Some of our jobs have their own InputFormat implementation that uses describe_ring, LOCAL_ONE,
and data with only replica in the offline datacenters. Works very well, except the last point
kinda sucks because we have online clients that want to read this data and have to then do
so through nodes in the offline datacenters. Underlying performance improvements: eg cross_node_timeout
and speculative requests; have helped but there's still the need to separate online and offline.
If we wanted to push replica out on to the online nodes, i think the best approach is for
us is to have to filter out those splits/locations in getRangeMap(..)
> Back to this issue we also have jobs using CqlInputFormat. Specifying multiple client
input addresses doesn't help take advantage of the multiple offline datacenters because the
Cassandra.Client only makes one call to describe_local_ring, and StorageService.describeLocalRing(..)
only checks against its own address. It would work to have either a) CqlInputFormat call describeLocalRing
against all the listed connection addresses and merge the results into one masterRangeNodes
list, or b) something along the lines of changing the signature of describeLocalRing(..) to
describeRings(String keyspace, String[] dc) and having the job specify which DCs it will be
running within.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message