pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy V. Ryaboy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-1914) Support load/store JSON data in Pig
Date Fri, 15 Jul 2011 14:19:00 GMT

    [ https://issues.apache.org/jira/browse/PIG-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065961#comment-13065961
] 

Dmitriy V. Ryaboy commented on PIG-1914:
----------------------------------------

Very cool.

Some quick code review notes:

Tiny typo here:
"e = foreach d generate flatten(men#'value') as val;" -- that should read menu#'value'


{code}
boolean notDone = in.nextKeyValue();
if (!notDone) {
    return null;
}
{code}

Better: {code}
if (!in.nextKeyValue()) {
    return null;
}
{code}

Parse exceptions: it's better to increment a counter and move on than to break on a bad input
string. Throwing an exception kills the whole job. So maybe something like 
{code}
t = null;
while (t == null && in.nextKeyValue()) {
 ...
}
return t;
{code}

In flatten_array, if the value is an array, you allocate a new bag, populate it recursively,
and add the contents of the new bag to the old bag. Why not skip the object allocation and
copy, and simply pass the original bag into the recursive call?

Also: are null values for keys just plain unsupported? You skip them.

setLocation: not that it really matters, but for consistency, you should use PigTextInputFormat
instead of PigFileInputFormat here.

schema: probably makes sense to implement getSchema?

> Support load/store JSON data in Pig
> -----------------------------------
>
>                 Key: PIG-1914
>                 URL: https://issues.apache.org/jira/browse/PIG-1914
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Chao Tian
>         Attachments: PIG-1914.patch
>
>
> The JSON is a commonly used data storage format. It is popular for storing structured
data, especially for JavaScript data exchange. 
> Pig should have the ability to load/store JSON format data. I plan to write one for the
piggy bank.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message