falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Srikanth Sundarrajan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-129) Disable Late data handling for hive tables
Date Wed, 16 Oct 2013 07:22:42 GMT

    [ https://issues.apache.org/jira/browse/FALCON-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796494#comment-13796494

Srikanth Sundarrajan commented on FALCON-129:

Few minor nits. 

Am trying to load up the following patches as a single one to help me review them holistically
once (FALCON-94, FALCON-93, FALCON-90 & FALCON-129 in that order). Here are some observations:

# CatalogPartition class is not included in the patch.
# Possibly incorrect checkstyle warning supression (RetryHandler, AbstractRerunHandler &
LateRerunHandler). Visibility doesn't seem to be modified. Number of arguments seems to increased
    //SUSPEND CHECKSTYLE CHECK VisibilityModifierCheck
    public abstract void handleRerun(String cluster, String entityType,
                                     String entityName, String nominalTime, String runId,
String wfId,
                                     long msgReceivedTime, String feedStorageType);
    //RESUME CHECKSTYLE CHECK VisibilityModifierCheck
# Process involving table storage shouldn't be considered for late handling as the same is
not implemented. ProcessEntityParser should include this in validations.
# FeedCleanupHandler, uses the FileStatus array for deletion. It might be good to check for
null return value here.
            FileStatus[] paths = fs.globStatus(stagingPath);
            delete(cluster, feed, retention, paths);
# Would it help to have test cases added to FeedEvictor for catalog storage type. Looks like
the test cases are for FS type.
# From FeedEntityParser code it looks like feed entities with late arrival section is rejected,
but sample config used in tests seem to contain in common/src/test/resources/config/feed/hive-table-feed.xml.
Is there a gap ? Should FeedEntityParsesTest::testParseFeedWithTable pass?
    private void validateLateData(Feed feed) throws FalconException {
        if (FeedHelper.getStorageType(feed) == Storage.TYPE.TABLE
                && feed.getLateArrival() != null) {
            throw new ValidationException("Late data handling is not supported for feeds with
table storage! "
                    + feed.getName());
# Any specific reason to comment out this in oozie-workflow-0.3.xsd
            <!--<xs:any namespace="uri:oozie:sla:0.1" minOccurs="0" maxOccurs="1"/>-->

This is indeed a very complex feature and patch is very clean and changes are fairly intuitive.

> Disable Late data handling for hive tables
> ------------------------------------------
>                 Key: FALCON-129
>                 URL: https://issues.apache.org/jira/browse/FALCON-129
>             Project: Falcon
>          Issue Type: Sub-task
>    Affects Versions: 0.3
>            Reporter: Venkatesh Seetharam
>            Assignee: Venkatesh Seetharam
>         Attachments: FALCON-129.patch, FALCON-129-r1.patch
> HCat nor Hive APIs expose internal stats about a given partition. The only way to get
the partition size is to get the location of the partition on HDFS and then use globStatus
and contentSummary APIs.

This message was sent by Atlassian JIRA

View raw message