falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Srikanth Sundarrajan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-129) Disable Late data handling for hive tables
Date Wed, 16 Oct 2013 07:22:42 GMT

    [ https://issues.apache.org/jira/browse/FALCON-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796494#comment-13796494
] 

Srikanth Sundarrajan commented on FALCON-129:
---------------------------------------------

Few minor nits. 

Am trying to load up the following patches as a single one to help me review them holistically
once (FALCON-94, FALCON-93, FALCON-90 & FALCON-129 in that order). Here are some observations:

# CatalogPartition class is not included in the patch.
# Possibly incorrect checkstyle warning supression (RetryHandler, AbstractRerunHandler &
LateRerunHandler). Visibility doesn't seem to be modified. Number of arguments seems to increased
instead.
{code}
    //SUSPEND CHECKSTYLE CHECK VisibilityModifierCheck
    public abstract void handleRerun(String cluster, String entityType,
                                     String entityName, String nominalTime, String runId,
String wfId,
                                     long msgReceivedTime, String feedStorageType);
    //RESUME CHECKSTYLE CHECK VisibilityModifierCheck
{code} 
# Process involving table storage shouldn't be considered for late handling as the same is
not implemented. ProcessEntityParser should include this in validations.
{code}
        validateLateInputs(process);
{code}
# FeedCleanupHandler, uses the FileStatus array for deletion. It might be good to check for
null return value here.
{code}
            FileStatus[] paths = fs.globStatus(stagingPath);
            delete(cluster, feed, retention, paths);
{code}
# Would it help to have test cases added to FeedEvictor for catalog storage type. Looks like
the test cases are for FS type.
# From FeedEntityParser code it looks like feed entities with late arrival section is rejected,
but sample config used in tests seem to contain in common/src/test/resources/config/feed/hive-table-feed.xml.
Is there a gap ? Should FeedEntityParsesTest::testParseFeedWithTable pass?
{code}
    private void validateLateData(Feed feed) throws FalconException {
        if (FeedHelper.getStorageType(feed) == Storage.TYPE.TABLE
                && feed.getLateArrival() != null) {
            throw new ValidationException("Late data handling is not supported for feeds with
table storage! "
                    + feed.getName());
        }
    }
{code}
# Any specific reason to comment out this in oozie-workflow-0.3.xsd
{code}
            <!--<xs:any namespace="uri:oozie:sla:0.1" minOccurs="0" maxOccurs="1"/>-->
{code}

This is indeed a very complex feature and patch is very clean and changes are fairly intuitive.


> Disable Late data handling for hive tables
> ------------------------------------------
>
>                 Key: FALCON-129
>                 URL: https://issues.apache.org/jira/browse/FALCON-129
>             Project: Falcon
>          Issue Type: Sub-task
>    Affects Versions: 0.3
>            Reporter: Venkatesh Seetharam
>            Assignee: Venkatesh Seetharam
>         Attachments: FALCON-129.patch, FALCON-129-r1.patch
>
>
> HCat nor Hive APIs expose internal stats about a given partition. The only way to get
the partition size is to get the location of the partition on HDFS and then use globStatus
and contentSummary APIs.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message