hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Sprague <sprag...@gmail.com>
Subject Re: need help with an error - script used to work and now it does not :-(
Date Fri, 17 May 2013 18:36:40 GMT
ok. so it sounds like you are doing A/B testing then.   so if it works in
your sandbox but doesn't in prod then you can slowing transform your
sandbox - one component at time - to look like your prod system until it
breaks.  The last component you add then is an area of interest.

CTAS is short for "Create Table <blah> AS"


On Fri, May 17, 2013 at 11:25 AM, Sanjay Subramanian <
Sanjay.Subramanian@wizecommerce.com> wrote:

>  Hi
> I actually did all of the following
> - tested all UDFs…they return values correctly
> - tested left side of LEFT OUTER JOIN
> - tested right side of LEFT OUTER JOIN
>
>  But when I add that ON statement
>  *     sh.date_seller=h.header_date*
>
>  I start getting this error…and this script has had no change for 3
> weeks….used to run fine in production and we did 15 days of aggregations
> using this script.
> Two days back we installed LZO compression on the production
> servers….Circumstancial…but the script is failing after that LZO jar
> install…Maybe totally unrelated
>
>  As we speak I am testing this script on my sandbox which I am fairly
> sure will work since I don't have LZO compression on my sandbox but I want
> to verify
>
>  What is CTAS semantics ? I don't know so please tell me… But even if I
> create intermediate tables, I will eventually need to join them…
>
>  Thanks
> sanjay
>
>   From: Stephen Sprague <spragues@gmail.com>
> Reply-To: "user@hive.apache.org" <user@hive.apache.org>
> Date: Friday, May 17, 2013 11:18 AM
>
> To: "user@hive.apache.org" <user@hive.apache.org>
> Subject: Re: need help with an error - script used to work and now it
> does not :-(
>
>   in the meantime why don't you breakup your single query into a series
> of queries (using CTAS semantics to create intermediate tables  ).
>
> The idea is narrow the problem down to a minimal size that _isolates the
> problem_  .  what you have there is overly complex to expect someone to
> troubleshoot for you.  try to minimize the failure case. take out your
> UDF's. Does it work then or fail?   strip it down to the bare necessities!
>
>
> On Fri, May 17, 2013 at 10:56 AM, Sanjay Subramanian <
> Sanjay.Subramanian@wizecommerce.com> wrote:
>
>>  I am using Hive 0.9.0+155  that is bundled in Cloudera Manager version
>> 4.1.2
>> Still getting the errors  listed below :-(
>> Any clues will be be cool !!!
>> Thanks
>>
>>  sanjay
>>
>>
>>   From: Sanjay Subramanian <sanjay.subramanian@wizecommerce.com>
>> Date: Thursday, May 16, 2013 9:42 PM
>>
>> To: "user@hive.apache.org" <user@hive.apache.org>
>> Subject: Re: need help with an error - script used to work and now it
>> does not :-(
>>
>>   :-( Still facing problems in large datasets
>> Were u able to solve this Edward ?
>> Thanks
>> sanjay
>>
>>   From: Sanjay Subramanian <sanjay.subramanian@wizecommerce.com>
>> Reply-To: "user@hive.apache.org" <user@hive.apache.org>
>> Date: Thursday, May 16, 2013 8:25 PM
>> To: "user@hive.apache.org" <user@hive.apache.org>
>> Subject: Re: need help with an error - script used to work and now it
>> does not :-(
>>
>>   Thanks Edward…I just checked all instances of guava jars…except those
>> in red all seem same version
>>
>>  /usr/lib/hadoop/client/guava-11.0.2.jar
>> /usr/lib/hadoop/client-0.20/guava-11.0.2.jar
>> /usr/lib/hadoop/lib/guava-11.0.2.jar
>> /usr/lib/hadoop-httpfs/webapps/webhdfs/WEB-INF/lib/guava-11.0.2.jar
>> /usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar
>> /usr/lib/oozie/libtools/guava-11.0.2.jar
>> /usr/lib/hive/lib/guava-11.0.2.jar
>> /usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar
>> /usr/lib/hbase/lib/guava-11.0.2.jar
>> /usr/lib/flume-ng/lib/guava-11.0.2.jar
>> /usr/share/cmf/lib/cdh3/guava-r09-jarjar.jar
>> /usr/share/cmf/lib/guava-12.0.1.jar
>>
>>  But I made a small change in my query (I just removed the text marked
>> in blue) that seemed to solve it at least for the test data set that I
>> had….Now I need to run it in production for a days worth of data
>>
>>  Will keep u guys posted
>>
>>
>> ------------------------------------------------------------------------------------------------------------
>>  SELECT
>>     h.header_date_donotquery * as date_*,
>>     h.header_id as *impression_id*,
>>     h.header_searchsessionid as *search_session_id*,
>>     h.cached_visitid *as visit_id* ,
>>     split(h.server_name_donotquery,'[\.]')[0] *as server*,
>>     h.cached_ip *ip*,
>>     h.header_adnodeid *ad_nodes*,
>>
>> ------------------------------------------------------------------------------------------------------------
>>
>>  Thanks
>>
>>  sanjay
>>
>>
>>   From: Edward Capriolo <edlinuxguru@gmail.com>
>> Reply-To: "user@hive.apache.org" <user@hive.apache.org>
>> Date: Thursday, May 16, 2013 7:51 PM
>> To: "user@hive.apache.org" <user@hive.apache.org>
>> Subject: Re: need help with an error - script used to work and now it
>> does not :-(
>>
>>   Ironically I just got a misleading error like this today. What
>> happened was I upgraded to hive 0.10.One of my programs was liked to guava
>> 15 but hive provides guava 09 on the classpath confusing things. I also had
>> a similar issue with mismatched slf 4j and commons-logger.
>>
>>
>> On Thu, May 16, 2013 at 10:34 PM, Sanjay Subramanian <
>> Sanjay.Subramanian@wizecommerce.com> wrote:
>>
>>>   2013-05-16 18:57:21,094 FATAL [IPC Server handler 19 on 40222] org.apache.hadoop.mapred.TaskAttemptListenerImpl:
Task: attempt_1368666339740_0135_m_000104_1 - exited : java.lang.RuntimeException: Error in
configuring object
>>> 	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>> 	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
>>> 	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
>>> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:395)
>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:334)
>>> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:152)
>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>> 	at javax.security.auth.Subject.doAs(Subject.java:396)
>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
>>> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:147)
>>> Caused by: java.lang.reflect.InvocationTargetException
>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>>> 	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
>>> 	... 9 more
>>> Caused by: java.lang.RuntimeException: Error in configuring object
>>> 	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>> 	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
>>> 	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
>>> 	at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
>>> 	... 14 more
>>> Caused by: java.lang.reflect.InvocationTargetException
>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>>> 	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
>>> 	... 17 more
>>> Caused by: java.lang.RuntimeException: Map operator initialization failed
>>> 	at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
>>> 	... 22 more*Caused by: java.lang.RuntimeException: cannot find field header_date
from [org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@2add5681,
org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspect*or$MyField@295a4523,
org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@6571120a,
org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@6257828d,
org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@5f3c296b,
org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@66c360a5,
org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@24fe2558,
org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@2945c761,
org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector$MyField@2424c672]
>>> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:345)
>>> 	at org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldRef(UnionStructObjectInspector.java:100)
>>> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)
>>> 	at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:896)
>>> 	at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:922)
>>> 	at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)
>>> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
>>> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
>>> 	at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
>>> 	at org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:78)
>>> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
>>> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
>>> 	at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
>>> 	at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:166)
>>> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
>>> 	at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
>>> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
>>> 	at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98)
>>> 	... 22 more
>>>
>>>  *MY SCRIPT is given below*
>>> =====================
>>>  hive -hiveconf hive.root.logger=INFO,console -hiveconf
>>> mapred.job.priority=VERY_HIGH -e "
>>> SET hive.exec.compress.output=true;
>>> SET mapred.reduce.tasks=16;
>>> SET
>>> mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
>>> add jar ${JAR_NAME_AND_PATH};
>>> create temporary function collect  as
>>> 'com.wizecommerce.utils.hive.udf.GenericUDAFCollect';
>>> create temporary function isnextagip  as
>>> 'com.wizecommerce.utils.hive.udf.IsNextagIP';
>>> create temporary function isfrombot  as
>>> 'com.wizecommerce.utils.hive.udf.IsFromBot';
>>> create temporary function processblankkeyword  as
>>> 'com.wizecommerce.utils.hive.udf.ProcessBlankKeyword';
>>> create temporary function getSellersProdImpr as
>>> 'com.wizecommerce.utils.hive.udf.GetSellersWithValidSellerIdsProdImpr';
>>> create temporary function getProgramCode as
>>> 'com.wizecommerce.utils.hive.udf.GetProgramCodeFromSellerClickContext';
>>> INSERT OVERWRITE DIRECTORY
>>> '/user/beeswax/warehouse/${HIVE_OUTPUT_TBL}/${DATE_STR}'
>>> SELECT
>>>     h.header_date_donotquery as date_,
>>>     h.header_id as impression_id,
>>>     h.header_searchsessionid as search_session_id,
>>>     h.cached_visitid as visit_id ,
>>>     split(h.server_name_donotquery,'[\.]')[0] as server,
>>>     h.cached_ip ip,
>>>     h.header_adnodeid ad_nodes,
>>>     if(concat_ws(',' , getSellersProdImpr(collect_set(concat_ws('|',
>>>                                        if(h.seller_sellerid is null,
>>> 'null',cast(h.seller_sellerid as STRING)),
>>>                                        if(h.seller_tagid is
>>> null,'null',cast(h.seller_tagid as STRING)),
>>>                                        cast(IF(h.seller_subtotal IS
>>> NULL, -1, h.seller_subtotal)  as STRING),
>>>                                        cast(IF(h.seller_pricetier IS
>>> NULL, -1, h.seller_pricetier) as STRING),
>>>                                        cast(IF(h.seller_pricerank
>>> IS  NULL, -1, h.seller_pricerank) as STRING),
>>>                                        cast(IF(h.seller_cpc IS NULL, -1,
>>> h.seller_cpc) as STRING),
>>>                                        h.program_code_notnull)))) = '',
>>> NULL, concat_ws(',' , getSellersProdImpr(collect_set(concat_ws('|',
>>>                                        if(h.seller_sellerid is null,
>>> 'null',cast(h.seller_sellerid as STRING)),
>>>                                        if(h.seller_tagid is
>>> null,'null',cast(h.seller_tagid as STRING)),
>>>                                        cast(IF(h.seller_subtotal IS
>>> NULL, -1, h.seller_subtotal)  as STRING),
>>>                                        cast(IF(h.seller_pricetier IS
>>> NULL, -1, h.seller_pricetier) as STRING),
>>>                                        cast(IF(h.seller_pricerank
>>> IS  NULL, -1, h.seller_pricerank) as STRING),
>>>                                        cast(IF(h.seller_cpc IS NULL, -1,
>>> h.seller_cpc) as STRING),
>>>                                        h.program_code_notnull))))) as
>>> visible_sellers,
>>>
>>>      if(concat_ws(',' , getSellersProdImpr(collect_set(concat_ws('|',
>>>                                        if(sh.seller_id is
>>> null,'null',cast(sh.seller_id as STRING)),
>>>                                        if(sh.tag_id is null, 'null',
>>> cast(sh.tag_id as STRING)),
>>>                                        '-1.0',
>>>                                        cast(IF(sh.price_tier IS NULL,
>>> -1, sh.price_tier) as STRING),
>>>                                        '-1',
>>>                                        cast(IF(sh.price_tier IS NULL,
>>> -1.0, sh.price_tier*1.0) as STRING),
>>>                                        h.program_code_null)))) = '',
>>> NULL, concat_ws(',' , getSellersProdImpr(collect_set(concat_ws('|',
>>>                                        if(sh.seller_id is
>>> null,'null',cast(sh.seller_id as STRING)),
>>>                                        if(sh.tag_id is null, 'null',
>>> cast(sh.tag_id as STRING)),
>>>                                        '-1.0',
>>>                                        cast(IF(sh.price_tier IS NULL,
>>> -1, sh.price_tier) as STRING),
>>>                                        '-1',
>>>                                        cast(IF(sh.price_tier IS NULL,
>>> -1.0, sh.price_tier*1.0) as STRING),
>>>                                        h.program_code_null))))) as
>>> invisible_sellers
>>> FROM
>>>      (SELECT
>>>           header_id,
>>>           header_date,
>>>           header_date_donotquery,
>>>           header_searchsessionid,
>>>           cached_visitid,
>>>           cached_ip,
>>>           header_adnodeid,
>>>           server_name_donotquery,
>>>           seller_sellerid,
>>>           seller_tagid,
>>>           cast (regexp_replace(seller_subtotal,',','.') as DOUBLE) as
>>> seller_subtotal,
>>>           seller_pricetier,
>>>           seller_pricerank,
>>>           CAST(CAST(seller_cpc as INT) as DOUBLE) as seller_cpc,
>>>           cast(getProgramCode('${THISHOST}',
>>> '${REST_API_SERVER_NAME}',seller_clickcontext) as STRING) as
>>> program_code_notnull,
>>>           cast(getProgramCode('${THISHOST}', '${REST_API_SERVER_NAME}',
>>> '') as STRING) as program_code_null
>>>       FROM
>>>           product_impressions_hive_only
>>>       WHERE
>>>          header_date='${DATE_STR}'
>>>       AND
>>>          cached_recordid IS NOT NULL
>>>       AND
>>>          isnextagip(cached_ip) = FALSE
>>>       AND
>>>          isfrombot(cached_visitid) = FALSE
>>>       AND
>>>          header_skipsellerloggingflag = 0
>>>      ) h
>>>
>>>  LEFT OUTER JOIN
>>>      (SELECT
>>>            *
>>>       FROM
>>>            prodimpr_seller_hidden
>>>       WHERE
>>>            date_seller = '${DATE_STR}'
>>>      ) sh
>>> ON
>>>      h.header_id = sh.header_id
>>> AND
>>>      sh.date_seller=h.header_date
>>> GROUP BY
>>>      h.header_date_donotquery,
>>>      h.header_id,
>>>      h.header_searchsessionid,
>>>      h.cached_visitid,
>>>      h.server_name_donotquery,
>>>      h.cached_ip,
>>>      h.header_adnodeid
>>> ;
>>> "
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> ======================
>>> This email message and any attachments are for the exclusive use of the
>>> intended recipient(s) and may contain confidential and privileged
>>> information. Any unauthorized review, use, disclosure or distribution is
>>> prohibited. If you are not the intended recipient, please contact the
>>> sender by reply email and destroy all copies of the original message along
>>> with any attachments, from your computer system. If you are the intended
>>> recipient, please be advised that the content of this message is subject to
>>> access, review and disclosure by the sender's Email System Administrator.
>>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> ======================
>> This email message and any attachments are for the exclusive use of the
>> intended recipient(s) and may contain confidential and privileged
>> information. Any unauthorized review, use, disclosure or distribution is
>> prohibited. If you are not the intended recipient, please contact the
>> sender by reply email and destroy all copies of the original message along
>> with any attachments, from your computer system. If you are the intended
>> recipient, please be advised that the content of this message is subject to
>> access, review and disclosure by the sender's Email System Administrator.
>>
>> CONFIDENTIALITY NOTICE
>> ======================
>> This email message and any attachments are for the exclusive use of the
>> intended recipient(s) and may contain confidential and privileged
>> information. Any unauthorized review, use, disclosure or distribution is
>> prohibited. If you are not the intended recipient, please contact the
>> sender by reply email and destroy all copies of the original message along
>> with any attachments, from your computer system. If you are the intended
>> recipient, please be advised that the content of this message is subject to
>> access, review and disclosure by the sender's Email System Administrator.
>>
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>

Mime
View raw message