pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eli Finkelshteyn <iefin...@gmail.com>
Subject Re: Loading LZOs With Some JSON
Date Tue, 13 Sep 2011 16:31:50 GMT
Sweet! Just got this working! For anyone with the same problem in the 
future: apparently JsonStringToMap() *does not* like bytearrays. If you 
simply cast your json as a chararray when you're loading, the error 
disappears!

Eli

On 9/13/11 11:51 AM, Eli Finkelshteyn wrote:
> Correction: I forgot to run the JsonStringToMap function when writing 
> my last email, when I run that, I get the same error as before 
> (*org.apache.pig.data.DataByteArray cannot be cast to java.lang.String*).
>
> My full workflow is as follows:
>
> initial = LOAD 'some_file.lzo' USING 
> com.twitter.elephantbird.pig.store.LzoPigStorage('\\t') AS (col1, 
> col2, col3, json_data);
> map = FOREACH initial GENERATE 
> com.twitter.elephantbird.pig.piggybank.JsonStringToMap(json_data) AS 
> mapped_json_data;
> extracted = FOREACH map GENERATE (chararray) mapped_json_data#'type' 
> AS type;
> dump extracted;
>
> Any ideas?
>
> Eli
>
> On 9/13/11 11:20 AM, Eli Finkelshteyn wrote:
>> Well, it's not throwing me errors anymore. Now it's just discarding 
>> the field. When I run it on two records where I've verified a field 
>> exists in the json, I get:
>>
>> Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 2 time(s).
>>
>> More specifically, my json is of the following form:
>>
>> {"foo":0,"bar":"hi"}
>>
>> On that, I'm running:
>>
>> initial = LOAD 'some_file.lzo' USING 
>> com.twitter.elephantbird.pig.store.LzoPigStorage('\\t') AS (col1, 
>> col2, col3, json_data);
>> extracted = FOREACH initial GENERATE (chararray) json_data#'type' AS 
>> type;
>> dump extracted;
>>
>> Which gives me the above warning along with:
>>
>> ()
>> ()
>>
>> I also tried it without the cast to chararray, but received the same 
>> results. Should I be casting json_data as some other data type when I 
>> load it initially? Seems by default it's cast to a bytearray when I 
>> describe initial. Would that be a problem?
>>
>> Thanks for all the help so far!
>>
>> Eli
>>
>>
>>
>> On 9/12/11 9:26 PM, Dmitriy Ryaboy wrote:
>>> Ah yeah that's my favorite thing about Pig maps (prior to pig 0.9,
>>> theoretically).
>>> The values are bytearrays. You are probably trying to treat them as 
>>> strings.
>>>   You have to do stuff like this:
>>>
>>> x = foreach myrelation generate
>>>    (chararray) mymap#'foo' as foo,
>>>    (chararray) mymap#'bar' as bar;
>>>
>>>
>>> On Mon, Sep 12, 2011 at 11:54 AM, Eli Finkelshteyn<eli@tumblr.com>  
>>> wrote:
>>>
>>>> Hmmm, now it gets past my mention of the function, but when I run a 
>>>> dump on
>>>> generated information, I get:
>>>>
>>>> 2011-09-12 14:48:12,814 [main] ERROR 
>>>> org.apache.pig.tools.grunt.**Grunt -
>>>> ERROR 2997: Unable to recreate exception from backed error:
>>>> java.lang.ClassCastException: *org.apache.pig.data.**DataByteArray 
>>>> cannot
>>>> be cast to java.lang.String*
>>>>
>>>> Thanks for all the help so far!
>>>>
>>>> Eli
>>>>
>>>>
>>>> On 9/12/11 2:42 PM, Dmitriy Ryaboy wrote:
>>>>
>>>>> You also want json-simple-1.1.jar
>>>>>
>>>>>
>>>>> On Mon, Sep 12, 2011 at 10:46 AM, Eli 
>>>>> Finkelshteyn<iefinkel@gmail.**com<iefinkel@gmail.com>
>>>>>> wrote:
>>>>>   Hmm, I'm loading up hadoop-lzo.*.jar, elephant-bird.*.jar, 
>>>>> guava-*.jar,
>>>>>> and
>>>>>> piggybank.jar, and then trying to use that UDF, but getting the 
>>>>>> following
>>>>>> error:
>>>>>>
>>>>>> ERROR 2998: Unhandled internal error. org/json/simple/parser/**
>>>>>> ParseException
>>>>>>
>>>>>> java.lang.****NoClassDefFoundError: org/json/simple/parser/****
>>>>>> ParseException
>>>>>>         at java.lang.Class.forName0(****Native Method)
>>>>>>         at java.lang.Class.forName(Class.****java:247)
>>>>>>         at org.apache.pig.impl.****PigContext.resolveClassName(**
>>>>>> PigContext.java:426)
>>>>>>         at org.apache.pig.impl.****PigContext.****
>>>>>> instantiateFuncFromSpec(**
>>>>>> PigContext.java:456)
>>>>>>         at org.apache.pig.impl.****PigContext.****
>>>>>> instantiateFuncFromSpec(**
>>>>>> PigContext.java:508)
>>>>>>         at org.apache.pig.impl.****PigContext.****
>>>>>> instantiateFuncFromAlias(**
>>>>>> PigContext.java:531)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.EvalFuncSpec(****QueryParser.java:5462)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.BaseEvalSpec(****QueryParser.java:5291)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.UnaryExpr(****QueryParser.java:5187)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.CastExpr(****QueryParser.java:5133)
>>>>>>         at 
>>>>>> org.apache.pig.impl.****logicalLayer.parser.****QueryParser.**
>>>>>> MultiplicativeExpr(****QueryParser.java:5042)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.AdditiveExpr(****QueryParser.java:4968)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.InfixExpr(****QueryParser.java:4934)
>>>>>>         at 
>>>>>> org.apache.pig.impl.****logicalLayer.parser.****QueryParser.**
>>>>>> FlattenedGenerateItem(****QueryParser.java:4861)
>>>>>>         at 
>>>>>> org.apache.pig.impl.****logicalLayer.parser.****QueryParser.**
>>>>>> FlattenedGenerateItemList(****QueryParser.java:4747)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.GenerateStatement(****QueryParser.java:4704)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.NestedBlock(****QueryParser.java:4030)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.ForEachClause(****QueryParser.java:3433)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.BaseExpr(****QueryParser.java:1464)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.Expr(QueryParser.****java:1013)
>>>>>>         at org.apache.pig.impl.****logicalLayer.parser.**
>>>>>> QueryParser.Parse(QueryParser.****java:800)
>>>>>>         etc...
>>>>>>
>>>>>> Any ideas? I've verified that it recognizes the function itself,

>>>>>> and that
>>>>>> the data it's running on is valid json. Not sure what else I can

>>>>>> check.
>>>>>>
>>>>>> Eli
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 9/9/11 7:13 PM, Dmitriy Ryaboy wrote:
>>>>>>
>>>>>>   They derive from the same classes as far as lzo handling goes,

>>>>>> so I
>>>>>>> suspect
>>>>>>> something's up with your environment or inputs if you get
>>>>>>> LzoTokenizedLoader
>>>>>>> to work, but LzoJsonStorage does not.
>>>>>>>
>>>>>>> Note that LzoTokenizedLoader is deprecated -- just use 
>>>>>>> LzoPigStorage.
>>>>>>>
>>>>>>> JsonLoader wouldn't work for you because it expects the complete

>>>>>>> input
>>>>>>> line
>>>>>>> to be json, not part of it. You want to load with LzoPigStorage,

>>>>>>> and
>>>>>>> then
>>>>>>> apply the JsonStringToMap udf to the third field.
>>>>>>>
>>>>>>> -D
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Sep 9, 2011 at 3:49 PM, Eli 
>>>>>>> Finkelshteyn<iefinkel@gmail.****
>>>>>>> com<iefinkel@gmail.com>>
>>>>>>>
>>>>>>>   wrote:
>>>>>>>
>>>>>>>   Hi,
>>>>>>>
>>>>>>>> I'm currently working on trying to load lzos that contain
some 
>>>>>>>> JSON
>>>>>>>> elements. This is of the form:
>>>>>>>>
>>>>>>>> item1    item2    {'thing1':'1','thing2':'2'}
>>>>>>>> item3    item4    {'thing3':'1','thing27':'2'}
>>>>>>>> item5    item6    {'thing5':'1','thing19':'2'}
>>>>>>>>
>>>>>>>> I was thinking I could use LzoJsonLoader for this, but it
keeps
>>>>>>>> throwing
>>>>>>>> me
>>>>>>>> errors like:
>>>>>>>> ERROR com.hadoop.compression.lzo.******LzoCodec - Cannot
load
>>>>>>>> native-lzo
>>>>>>>> without native-hadoop
>>>>>>>>
>>>>>>>> This is despite the fact that I can load normal lzos just
fine 
>>>>>>>> using
>>>>>>>> LzoTokenizedLoader('\\t'). So, now I'm at a bit of a 
>>>>>>>> standstill. What
>>>>>>>> should
>>>>>>>> I do to go about loading these files? Does anyone have any
ideas?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Eli
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message