falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Yu <johnyu0...@gmail.com>
Subject Partitions in Feed definition
Date Thu, 24 Jul 2014 00:16:34 GMT
Hey all,

Few questions about Partitions:

Partitions in the FEED xml like below:

    <partitions>
        <partition name="colo"/>
        <partition name="country"/>
    </partitions>


   1. I see these are partition keys; are the partition keys values
(say country=us or country=uk) need to be defined before-hand or
unbounded?
   2. does the storage location need to have the partition key in
them? Like below (see the colo and country partition keys)

   <location path="/data/${colo}/${country}/${YEAR}/${MONTH}/${DAY}"
type="data"/>

   3.

   if the partition keys are not in the FileSystem path, how does
Falcon identify a feed partition physical location (actually,
how/where is it used)? I understand if it were HCAT, the Feed
definition has the partition key-values.

   4.

   Are these partition keys and values validated against the
FileSystem or HCAT locations?



Partition attribute in the Cluster reference:

Using the example from the documentation page
<http://falcon.incubator.apache.org/docs/FalconArchitecture.html#Replication>


   1. What does it mean to specify partitions in a source cluster ?
   2. vs target cluster? (does it act like a filter to pull only a
subset of data from source? -- if so how does Falcon know to read the
subset in Filesystem feed?)
   3. What data is in sourceCluster1, sourceCluster2 and what location?
   4. Which path does the replicated data end up in the backupCluster (target)?


A few questions.  Hopefully it's something straightforward about
partitions that I have missed.


Thanks for your answers,John

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message