falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-511) Support for multiple sources to multiple targets, without partitions
Date Thu, 17 Jul 2014 19:45:06 GMT

    [ https://issues.apache.org/jira/browse/FALCON-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065442#comment-14065442
] 

John Yu commented on FALCON-511:
--------------------------------

I am thinking of the following logic:

if ( 2+ source clusters specified and no partition ) {
  if ( source cluster preference specified )  // config example a
    look for the file in order of source cluster preference
  else {
    try copy from a source with same colo (based on cluster def)   // config example b
    will search through all source clusters for the data and copy from any // config example
c
  }
}


--- config example a ---
   <clusters>
        <cluster name="colo1-etl" type="source"> .. </cluster>
        <cluster name="colo2-etl" type="source"> .. </cluster>
        <cluster name="colo1-adhoc" type="target" source="colo1-etl,colo2-etl"> .. </cluster>
        <cluster name="colo2-adhoc" type="target"  source="colo2-etl,colo1-etl"> ..
</cluster>
    </clusters>

--- config example b ---
   <clusters>
        <cluster name="colo1-etl" type="source"> .. </cluster>
        <cluster name="colo2-etl" type="source"> .. </cluster>
        <cluster name="colo1-adhoc" type="target"> .. </cluster>
        <cluster name="colo2-adhoc" type="target"> .. </cluster>
    </clusters>

--- config example c ---
   <clusters>
        <cluster name="colo1" type="source"> .. </cluster>
        <cluster name="colo2" type="source"> .. </cluster>
        <cluster name="colo3" type="target"> .. </cluster>
        <cluster name="colo4" type="target"> .. </cluster>
    </clusters>


> Support for multiple sources to multiple targets, without partitions
> --------------------------------------------------------------------
>
>                 Key: FALCON-511
>                 URL: https://issues.apache.org/jira/browse/FALCON-511
>             Project: Falcon
>          Issue Type: New Feature
>            Reporter: John Yu
>
> We currently have the following use case:
> Colo1 has 1 ETL cluster (Colo1-ETL) and 1 adhoc cluster (Colo1-A)
> Colo2 has 1 ETL cluster (Colo2-ETL) and 1 adhoc cluster (Colo2-A)
> Due to the bandwidth constraint between the two colo's, we are thinking of having the
2 ETL clusters perform the same computation to generate the same dataset, and have the 2 adhoc
clusters pull from their respective colo-local ETL cluster.
> This can be done currently by specifying 2 different feeds.  However, a critical dataset
might be computed on different colos simultaneously for both DR and load balancing purposes.
 In this scenario, we would like to ease data discovery for end users by having only 1 feed
definition, so that end users know these pieces of data are logically the same data, and they
are free to pick one to use.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message