falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venkatesh Seetharam (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-102) Add integration tests for feed entity parser with table defined
Date Fri, 06 Sep 2013 16:29:51 GMT

    [ https://issues.apache.org/jira/browse/FALCON-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760329#comment-13760329
] 

Venkatesh Seetharam commented on FALCON-102:
--------------------------------------------

Thanks for your comments Srikanth. To provide adequate context, this is only required in feed
retention. 

bq. Didn't understand the significance of these string replacements.
Instead of custom parsing, I construct an URI and its not friendly with ${}. Hence the kludge.
Initially, I had it as an URI but was not sure of the pattern and now that we have fixed it,
may be it may make sense. Not sure how much it can buy us. In any case, this needs to be repeated
anyways to parse the various parts into database, table & partitions. 

bq. Following is a good candidate for a regex based validation enforced in xsd
Well, this is NOT from XSD but is always generated using getUriTemplate() and then passed
around which is eventually used to reconstruct the Storage object. 

bq. Does it make sense to assume a convention to set different location types as properties
within catalog?
Definitely makes sense as an enhancement after the first iteration. This first iteration will
ignore the location type. Also, there is NO API today in Hive nor HCatalog to add KV pairs
per partition. BTW, location type may not make sense in this context. 

bq. You probably meant pathType.
Paths are defined as part of a Location and this is deducing the location type for a given
Path. may be rename to getLocationTypeForPath?

bq. FileSystemStorage::getUriTemplate seems to be creating a composite string of all the paths.
Seems like the composite string is build with no further information about the path itself.
This is the current behavior in Falcon today. I only refactored to reflect what exists today.
Its only used in one place in the entire codebase: org.apache.falcon.converter.OozieFeedMapper.RetentionOozieWorkflowMapper#getRetentionWorkflowAction
{code}
                String feedDataPath = storage.getUriTemplate();
                props.put("feedDataPath", feedDataPath.replaceAll("\\$\\{", "\\?\\{"));
{code}
org.apache.falcon.retention.FeedEvictor#evictFileSystemInstances
{code}
        String[] feedLocs = feedBasePath.split("#");
        for (String path : feedLocs) {
            evictor(path, retentionType, retentionLimit, timeZone, frequency);
        }
{code}

bq. given a "#" separated string, we can't tell one path type from another. The user of this
function, may find it challenge to decompose this.
Thats correct. I try to deduce this from the path containing the type.
Say: /data/foo is DATA, /stats/bar is STATS, etc. Defaults to DATA if path has no type information.

I think I can send the type information as well and let storage parse it out and give me a
list of locations instead of doing the current way. More refactoring.

Makes sense?
                
> Add integration tests for feed entity parser with table defined
> ---------------------------------------------------------------
>
>                 Key: FALCON-102
>                 URL: https://issues.apache.org/jira/browse/FALCON-102
>             Project: Falcon
>          Issue Type: Sub-task
>    Affects Versions: 0.3
>            Reporter: Venkatesh Seetharam
>            Assignee: Venkatesh Seetharam
>         Attachments: FALCON-102.patch
>
>
> Having issues to get webhcat up and running. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message