crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <>
Subject [jira] [Updated] (CRUNCH-344) Full file glob syntax does not work correctly with Crunch
Date Sat, 15 Feb 2014 17:10:21 GMT


Gabriel Reid updated CRUNCH-344:

    Attachment: CRUNCH-344.patch

Patch to resolve the issue. URL encoding is used to serialize path information in the Configuration.
I went for URL encoding instead of base64 to make it easier to debug the Configuration if
issues do pop up at some point later.

> Full file glob syntax does not work correctly with Crunch
> ---------------------------------------------------------
>                 Key: CRUNCH-344
>                 URL:
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Gabriel Reid
>            Assignee: Gabriel Reid
>         Attachments: CRUNCH-344.patch
> Using an input path with some variants of Hadoop-supported glob syntax does not work.
This is specifically an issue when commas are used in a glob path, for example, a path like
"/input/file{1,2,3}.txt". The same underlying cause also makes it impossible to use (admittedly
much less common) paths that contain semicolons or pipe symbols.
> The underlying cause is the encoding used in CrunchInputs, which builds a string structure
using commas, pipes, and semicolons as field separators.

This message was sent by Atlassian JIRA

View raw message