crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CRUNCH-344) Full file glob syntax does not work correctly with Crunch
Date Sat, 15 Feb 2014 17:10:21 GMT

     [ https://issues.apache.org/jira/browse/CRUNCH-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gabriel Reid updated CRUNCH-344:
--------------------------------

    Attachment: CRUNCH-344.patch

Patch to resolve the issue. URL encoding is used to serialize path information in the Configuration.
I went for URL encoding instead of base64 to make it easier to debug the Configuration if
issues do pop up at some point later.

> Full file glob syntax does not work correctly with Crunch
> ---------------------------------------------------------
>
>                 Key: CRUNCH-344
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-344
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Gabriel Reid
>            Assignee: Gabriel Reid
>         Attachments: CRUNCH-344.patch
>
>
> Using an input path with some variants of Hadoop-supported glob syntax does not work.
This is specifically an issue when commas are used in a glob path, for example, a path like
"/input/file{1,2,3}.txt". The same underlying cause also makes it impossible to use (admittedly
much less common) paths that contain semicolons or pipe symbols.
> The underlying cause is the encoding used in CrunchInputs, which builds a string structure
using commas, pipes, and semicolons as field separators.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message