hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy V. Ryaboy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1237) Piggybank MutliStorage - specify field to write in output
Date Sun, 22 Aug 2010 13:23:17 GMT

    [ https://issues.apache.org/jira/browse/PIG-1237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901161#action_12901161
] 

Dmitriy V. Ryaboy commented on PIG-1237:
----------------------------------------

Gerrit,
Sorry this fell through the cracks! Just noticed this ticket.

The ability to specify just one column seems very limited. Perhaps instead one could optionally
specify whether to materialize the splitField? I think this would accomplish the same thing
in a more general manner.

Also perhaps this warrants a second constructor, as introducing new arguments to the existing
one will break backwards compatibility.

> Piggybank MutliStorage - specify field to write in output
> ---------------------------------------------------------
>
>                 Key: PIG-1237
>                 URL: https://issues.apache.org/jira/browse/PIG-1237
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Gerrit Jansen van Vuuren
>            Assignee: Gerrit Jansen van Vuuren
>            Priority: Minor
>         Attachments: PIG-1237.patch
>
>
> I've made a modification to the piggy bank MutliStorage class that allows to optionally
specify the index of the field in each tuple to write to output.
> This feature allows to have records with metadata like seqno, time of upload etc, and
then to combine files from these records into one but without the metadata.
> e.g. 
> 1: date type seq1 data
> 2:  date type seq2 data
> then write output grouped by type and ordered by sequence:
> data
> data

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message