falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srikanth Sundarrajan <srik...@hotmail.com>
Subject RE: Partitions in Feed definition
Date Thu, 24 Jul 2014 03:36:14 GMT
> are the partition keys values (say country=us or country=uk) need to be defined before-hand
or unbounded?Yes the partition values themselves are unbounded.
>  does the storage location need to have the partition key in themIn most cases there
are time partitions, besides the time partition, there can be other partition, which are declared
in the partition section. So the partitions ought to be in the path as a variable. It can
be skipped if no consumer has interest in filtering and selecting a section of the data through
the dataIn(input, partitionSpec) function.
> if the partition keys are not in the FileSystem path, how does Falcon identify a feed
partition physical location
If partition keys aren't specified, then Falcon can't use it either in the file system version
of the input. Partitions are only used in two scenarios by Falcon. 1) When data is partitioned
in multiple clusters, they can be merged into a single location using replication (single
target, multiple source). For this to work, each source should own a partition exclusively.
2) Data can be selectively consumed by filtering specific partition through the dataIn() EL
expression
RegardsSrikanth Sundarrajan

> From: johnyu0520@gmail.com
> Date: Wed, 23 Jul 2014 17:16:34 -0700
> Subject: Partitions in Feed definition
> To: dev@falcon.incubator.apache.org
> 
> Hey all,
> 
> Few questions about Partitions:
> 
> Partitions in the FEED xml like below:
> 
>     <partitions>
>         <partition name="colo"/>
>         <partition name="country"/>
>     </partitions>
> 
> 
>    1. I see these are partition keys; are the partition keys values
> (say country=us or country=uk) need to be defined before-hand or
> unbounded?
>    2. does the storage location need to have the partition key in
> them? Like below (see the colo and country partition keys)
> 
>    <location path="/data/${colo}/${country}/${YEAR}/${MONTH}/${DAY}"
> type="data"/>
> 
>    3.
> 
>    if the partition keys are not in the FileSystem path, how does
> Falcon identify a feed partition physical location (actually,
> how/where is it used)? I understand if it were HCAT, the Feed
> definition has the partition key-values.
> 
>    4.
> 
>    Are these partition keys and values validated against the
> FileSystem or HCAT locations?
> 
> 
> 
> Partition attribute in the Cluster reference:
> 
> Using the example from the documentation page
> <http://falcon.incubator.apache.org/docs/FalconArchitecture.html#Replication>
> 
> 
>    1. What does it mean to specify partitions in a source cluster ?
>    2. vs target cluster? (does it act like a filter to pull only a
> subset of data from source? -- if so how does Falcon know to read the
> subset in Filesystem feed?)
>    3. What data is in sourceCluster1, sourceCluster2 and what location?
>    4. Which path does the replicated data end up in the backupCluster (target)?
> 
> 
> A few questions.  Hopefully it's something straightforward about
> partitions that I have missed.
> 
> 
> Thanks for your answers,John
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message