falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Satish Mittal" <satish.mit...@apache.org>
Subject Re: Review Request 18626: FALCON-284: Hcatalog based feed retention doesn't work when partition filter spans across multiple partition keys
Date Mon, 10 Mar 2014 11:07:00 GMT


> On March 5, 2014, 7:46 p.m., Seetharam Venkatesh wrote:
> > common/src/main/java/org/apache/falcon/catalog/AbstractCatalogService.java, line
111
> > <https://reviews.apache.org/r/18626/diff/2/?file=507006#file507006line111>
> >
> >     Why do you need a separate method for this? Why not add this to getPartition
method itself. Seems wasted IMO.

The getPartitionColumns method fetches the names of all partition columns defined in the given
table name. This info can't be extracted from getPartition method, which returns details about
a specific partition within the table. Will rename the former method to getTablePartitionCols()
to be more precise.


> On March 5, 2014, 7:46 p.m., Seetharam Venkatesh wrote:
> > common/src/main/java/org/apache/falcon/catalog/HiveCatalogService.java, line 239
> > <https://reviews.apache.org/r/18626/diff/2/?file=507007#file507007line239>
> >
> >     Why do you need a separate method for this? Why not add this to getPartition
method itself. Seems wasted IMO.
> >     
> >     Also, it naturally belongs to CatalogStorage, no?

See earlier comments.

CatalogStorage class today contains the table details defined in a falcon feed xml, eg. URI,
db name, table name, partition string etc. It is not getting populated with any info fetched
from AbstractCatalogService. Have retained that abstraction.


> On March 5, 2014, 7:46 p.m., Seetharam Venkatesh wrote:
> > retention/src/main/java/org/apache/falcon/retention/FeedEvictor.java, line 391
> > <https://reviews.apache.org/r/18626/diff/2/?file=507009#file507009line391>
> >
> >     nit: method can be map or fill rather than get

Will rename.


> On March 5, 2014, 7:46 p.m., Seetharam Venkatesh wrote:
> > retention/src/main/java/org/apache/falcon/retention/FeedEvictor.java, line 502
> > <https://reviews.apache.org/r/18626/diff/2/?file=507009#file507009line502>
> >
> >     Can this method be on CatalogStorage?

See earlier comments.


> On March 5, 2014, 7:46 p.m., Seetharam Venkatesh wrote:
> > retention/src/main/java/org/apache/falcon/retention/FeedEvictor.java, line 539
> > <https://reviews.apache.org/r/18626/diff/2/?file=507009#file507009line539>
> >
> >     nit: dropPartitionsForAnInstance

Will rename.


> On March 5, 2014, 7:46 p.m., Seetharam Venkatesh wrote:
> > retention/src/main/java/org/apache/falcon/retention/FeedEvictor.java, line 553
> > <https://reviews.apache.org/r/18626/diff/2/?file=507009#file507009line553>
> >
> >     Why 2 booleans? dropped and deleted? rename dropped to deleted.

Will do.


- Satish


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18626/#review36258
-----------------------------------------------------------


On Feb. 28, 2014, 2:56 p.m., Satish Mittal wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18626/
> -----------------------------------------------------------
> 
> (Updated Feb. 28, 2014, 2:56 p.m.)
> 
> 
> Review request for Falcon and Srikanth Sundarrajan.
> 
> 
> Repository: falcon-git
> 
> 
> Description
> -------
> 
> When an HCatalog based feed is scheduled in falcon, retention only looks at the first
partition key that satisfies either of date pattern: yyyy | MM | dd | HH | mm. As a result,
it calculates a partition filter that contains only one of these patterns. However if HCatalog
table is defined in such a way that date spans across multiple partition keys (year/month/day/hour/minute),
then feed retention doesn't delete any partitions that are granular than first level (year).
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/falcon/catalog/AbstractCatalogService.java fc9c3b1

>   common/src/main/java/org/apache/falcon/catalog/HiveCatalogService.java 3c3660e 
>   common/src/main/java/org/apache/falcon/entity/common/FeedDataPath.java 4031e14 
>   retention/src/main/java/org/apache/falcon/retention/FeedEvictor.java 13c447c 
>   webapp/src/test/java/org/apache/falcon/lifecycle/TableStorageFeedEvictorIT.java 770780e

> 
> Diff: https://reviews.apache.org/r/18626/diff/
> 
> 
> Testing
> -------
> 
> - Added new integration tests in TableStorageFeedEvictorIT.java to test retention for
an Hcatalog feed where date consists of multiple partitions columns (year/month/day).
> - Verified the retention behavior on a test cluster having an Hcatalog based feed partitioned
by year/month/day/hour/minute/country.
> 
> 
> Thanks,
> 
> Satish Mittal
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message