crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-513) HFileSource not calculating size correctly for nested pathes
Date Sat, 25 Apr 2015 18:15:38 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512638#comment-14512638
] 

Josh Wills commented on CRUNCH-513:
-----------------------------------

Hey [~anelson425], I don't quite get what's going on here-- what do the child paths look like
that they don't get picked up by either a) the glob or b) the isDir() check that processes
the paths that are found by the glob? Is there another check we could add like the isDir()
one that would pick them up?

> HFileSource not calculating size correctly for nested pathes
> ------------------------------------------------------------
>
>                 Key: CRUNCH-513
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-513
>             Project: Crunch
>          Issue Type: Bug
>    Affects Versions: 0.10.0, 0.8.3
>            Reporter: Andy Nelson
>         Attachments: Crunch-513.patch
>
>
> The cause of this is that getInternalSize[1] does not traverse the child paths to determine
the size. 
> I have the fix in a patch that I will attach but I have not been able to successfully
append to the integration tests to see this failure. This issue only appears to be a problem
when using the DistributedFileSystem but the tests for HFileSource use RawLocalFileSystem.
I see there are additional tests that use the hadoop mini cluster, but I was not able to implement
correctly.
> [1] https://github.com/apache/crunch/blob/apache-crunch-0.8.3/crunch-hbase/src/main/java/org/apache/crunch/io/hbase/HFileSource.java#L116



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message