pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nezih Yigitbasi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-3627) Json storage : Doesn't work in cases , where other Store Functions (like PigStorage / AvroStorage) do work.
Date Mon, 03 Feb 2014 00:50:09 GMT

    [ https://issues.apache.org/jira/browse/PIG-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889134#comment-13889134
] 

Nezih Yigitbasi commented on PIG-3627:
--------------------------------------

Cheolsoo, I checked the link Jay posted in his previous comment and after some digging it
seems like the NULL schemas come from the STRSPLIT function, not from the LOAD function. Jay
uses FLATTEN(STRSPLIT(...)) in his script and STRSPLIT returns a tuple without any internal
schema (the schema of the internal elements are NULL).

> Json storage : Doesn't work in cases , where other Store Functions (like PigStorage /
AvroStorage) do work. 
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-3627
>                 URL: https://issues.apache.org/jira/browse/PIG-3627
>             Project: Pig
>          Issue Type: Bug
>            Reporter: jay vyas
>
> The following query 
> {code:title=Bar.java|borderStyle=solid}
>         pigServer.registerQuery(
>                 "uniqcnt  = foreach transactionsG {"+
>                                "sym = transactions.product ;"+
>                                "dsym = distinct sym  ;"+
>                                "generate flatten(dsym.product) as product, COUNT(dsym)
as count ;" +
>                                "};");
> {code} 
> Results in the schema:
> {code} 
>    Schema : {product: NULL,count: long}
> {code}
> This schema, is storable using AvroStorage or PigStorage, but it fails if stored using
JsonStorage: 
> {code}
> Failed to parse: <line 1, column 8>  Syntax error, unexpected symbol at or near
','
> 	at org.apache.pig.parser.QueryParserDriver.parseSchema(QueryParserDriver.java:94)
> 	at org.apache.pig.parser.QueryParserDriver.parseSchema(QueryParserDriver.java:108)
> 	at org.apache.pig.impl.util.Utils.parseSchema(Utils.java:208)
> 	at org.apache.pig.impl.util.Utils.getSchemaFromString(Utils.java:182)
> 	at org.apache.pig.builtin.JsonStorage.prepareToWrite(JsonStorage.java:140)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.<init>(PigOutputFormat.java:125)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:86)
> 	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:553)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
> 	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> {code}
> It appears that JsonStorage is thus less robust than the other storage formats.  Can
we confirm or deny if some types of data structures do/ do not work with JsonStorage? 
> So,I suggest:
> 1) Ideally, I would think JsonStorage should support the same data that other Storage
functions support.   
> the next best thing: 
> 2) Maybe a wiki page of examples that can / cannot work with JsonStorage and/or a better
error message would be sufficient to solve this "bug".



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message