Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B22E0DD88 for ; Thu, 13 Dec 2012 00:06:58 +0000 (UTC) Received: (qmail 17773 invoked by uid 500); 13 Dec 2012 00:06:57 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 17704 invoked by uid 500); 13 Dec 2012 00:06:57 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 17694 invoked by uid 99); 13 Dec 2012 00:06:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Dec 2012 00:06:57 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.220.176] (HELO mail-vc0-f176.google.com) (209.85.220.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Dec 2012 00:06:49 +0000 Received: by mail-vc0-f176.google.com with SMTP id fo13so1513910vcb.35 for ; Wed, 12 Dec 2012 16:06:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding:x-gm-message-state; bh=wAsselqcxfrKaFLJwbd43JvlwtNYg6m/XRmoTfriZFM=; b=gm1f4FlADg+6RA9PzPMD0tOGn5cIVWddGYdgG5XtedctVpvdgbMGlO4y/r/wKHmikm JgepjaCVcl3vsnB5MR3B9FJYUNgKCtzTuO1Yju+IGdoL+/p8ym7dx9n53EncdVnKJPK/ PIGyQvjTP0EmhM+T10mvjfwsFKhZWKUP6J87kYz8W+SPco/1AQEoWbtId14672SDNGfy Lp98/7JeVFHsHdvPbV5YSs5hnEfIV67FlNfttrq3l6s8hhzTTyyv4I+7qKyNplOTl8Tx ebUUAFAX6AAn3p01UO9NS5EOHZMf9lUpClGlaRnOMjMVnRCI4WAvICODwwFG9aWDoZWY CKsQ== MIME-Version: 1.0 Received: by 10.58.107.235 with SMTP id hf11mr2164943veb.16.1355357185198; Wed, 12 Dec 2012 16:06:25 -0800 (PST) Received: by 10.58.136.105 with HTTP; Wed, 12 Dec 2012 16:06:25 -0800 (PST) In-Reply-To: References: Date: Thu, 13 Dec 2012 09:06:25 +0900 Message-ID: Subject: Re: Array index support non-constant expresssion From: =?EUC-KR?B?TmF2aXO3+b3Cv+w=?= To: user@hive.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQnv8Zg4HE+9NabGov47ddbXVbIwSW/D3nAb/r7TSgSbiBamBgBwcVijqP4oSdOvBOC1PAa5 X-Virus-Checked: Checked by ClamAV on apache.org Could you try it with CP/PPD disabled? set hive.optimize.cp=3Dfalse; set hive.optimize.ppd=3Dfalse; 2012/12/13 java8964 java8964 : > Hi, > > I played my query further, and found out it is very puzzle to explain the > following behaviors: > > 1) The following query works: > > select c_poi.provider_str, c_poi.name from (select darray(search_results, > c.rank) as c_poi from nulf_search lateral view explode(search_clicks) > clickTable as c) a > > I get get all the result from the above query without any problem. > > 2) The following query NOT works: > > select c_poi.provider_str, c_poi.name from (select darray(search_results, > c.rank) as c_poi from nulf_search lateral view explode(search_clicks) > clickTable as c) a where c_poi.provider_str =3D 'POI' > > As long as I add the where criteria on provider_str, or even I added anot= her > level of sub query like following: > > select > ps, name > from > (select c_poi.provider_str as ps, c_poi.name as name from (select > darray(search_results, c.rank) as c_poi from nulf_search lateral view > explode(search_clicks) clickTable as c) a ) b > where ps =3D 'POI' > > any kind of criteria I tried to add on provider_str, the hive MR jobs fai= led > in the same error I shown below. > > Any idea why this happened? Is it related to the data? But provider_str i= s > just a simple String type. > > Thanks > > Yong > > ________________________________ > From: java8964@hotmail.com > To: user@hive.apache.org > Subject: RE: Array index support non-constant expresssion > Date: Wed, 12 Dec 2012 12:15:27 -0500 > > > OK. > > I followed the hive source code of > org.apache.hadoop.hive.ql.udf.generic.GenericUDFArrayContains and wrote t= he > UDF. It is quite simple. > > It works fine as I expected for simple case, but when I try to run it und= er > some complex query, the hive MR jobs failed with some strange errors. Wha= t I > mean is that it failed in HIVE code base, from stuck trace, I can not see > this failure has anything to do with my custom code. > > I would like some help if some one can tell me what went wrong. > > For example, I created this UDF called darray, stand for dynamic array, > which supports the non-constant value as the index location of the array. > > The following query works fine as I expected: > > hive> select c_poi.provider_str as provider_str, c_poi.name as name from > (select darray(search_results, c.index_loc) as c_poi from search_table > lateral view explode(search_clicks) clickTable as c) a limit 5; > POI xxxx > ADDRESS some address > POI xxxx > POI xxxx > ADDRESSS some address > > Of course, in this case, I only want the provider_str =3D 'POI' returned,= and > filter out any rows with provider_str !=3D 'POI', so it sounds simple, I > changed the query to the following: > > hive> select c_poi.provider_str as provider_str, c_poi.name as name from > (select darray(search_results, c.rank) as c_poi from search_table lateral > view explode(search_clicks) clickTable as c) a where c_poi.provider_str = =3D > 'POI' limit 5; > Total MapReduce jobs =3D 1 > Launching Job 1 out of 1 > Number of reduce tasks is set to 0 since there's no reduce operator > Cannot run job locally: Input Size (=3D 178314025) is larger than > hive.exec.mode.local.auto.inputbytes.max (=3D 134217728) > Starting Job =3D job_201212031001_0100, Tracking URL =3D > http://blevine-desktop:50030/jobdetails.jsp?jobid=3Djob_201212031001_0100 > Kill Command =3D /home/yzhang/hadoop/bin/hadoop job > -Dmapred.job.tracker=3Dblevine-desktop:8021 -kill job_201212031001_0100 > 2012-12-12 11:45:24,090 Stage-1 map =3D 0%, reduce =3D 0% > 2012-12-12 11:45:43,173 Stage-1 map =3D 100%, reduce =3D 100% > Ended Job =3D job_201212031001_0100 with errors > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.MapRedTask > > I am only add a Where limitation, but to my surprise, the MR jobs generat= ed > by HIVE failed. I am testing this in my local standalone cluster, which i= s > running CDH3U3 release. When I check the hadoop userlog, here is what I g= ot: > > 2012-12-12 11:40:22,421 INFO org.apache.hadoop.hive.ql.exec.SelectOperato= r: > SELECT > struct<_col0:bigint,_col1:string,_col2:string,_col3:string,_col4:string,_= col5:string,_col6:boolean,_col7:boolean,_col8:boolean,_col9:boolean,_col10:= boolean,_col11:boolean,_col12:string,_col13:string,_col14:struct,categories_id:array,categories_name:array,lang_raw:string,la= ng_rose:string,lang:string,viewport:struct>,_col15:struct>>,_col16:array>,_col17:array>,= _col18:string,_col19:struct> > 2012-12-12 11:40:22,440 WARN org.apache.hadoop.mapred.Child: Error runnin= g > child > java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93= ) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:1= 17) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387= ) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) > at org.apache.hadoop.mapred.Child$4.run(Child.java:270) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation= .java:1157) > at org.apache.hadoop.mapred.Child.main(Child.java:264) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java= :39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI= mpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88= ) > ... 9 more > Caused by: java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93= ) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:1= 17) > at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34= ) > ... 14 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java= :39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI= mpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88= ) > ... 17 more > Caused by: java.lang.RuntimeException: Map operator initialization failed > at > org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121) > ... 22 more > Caused by: java.lang.RuntimeException: cannot find field results from > [0:_col0, 1:_col1, 2:_col2, 3:_col3, 4:_col4, 5:_col5, 6:_col6, 7:_col7, > 8:_col8, 9:_col9, 10:_col10, 11:_col11, 12:_col12, 13:_col13, 14:_col14, > 15:_col15, 16:_col16, 17:_col17, 18:_col18, 19:_col19] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getSta= ndardStructFieldRef(ObjectInspectorUtils.java:321) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspect= or.getStructFieldRef(StandardStructObjectInspector.java:119) > at > org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNod= eColumnEvaluator.java:57) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(Ex= prNodeGenericFuncEvaluator.java:77) > at > org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878) > at > org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Ope= rator.java:904) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator= .java:60) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:= 389) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeOp(Operator.java:374) > at > org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.initializeOp(Later= alViewJoinOperator.java:109) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:= 389) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeOp(Operator.java:374) > at > org.apache.hadoop.hive.ql.exec.UDTFOperator.initializeOp(UDTFOperator.jav= a:85) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:= 389) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator= .java:62) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:= 389) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeOp(Operator.java:374) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:= 389) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOp= erator.java:133) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) > at > org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:= 444) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) > at > org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98) > ... 22 more > 2012-12-12 11:40:22,443 INFO org.apache.hadoop.mapred.Task: Runnning clea= nup > for the task > > Of course, this whole table is built on very complex nested (struct + arr= ay > ....., you can see the struct in the top part of stuck trace) structure > with my own custom InputFormat/SerDe class. But everything so far works > fine. I can query this table anyway I want without seeing errors. This is > the first problem I met when I tried to query it in a way I want, but is > given an error I am not sure the reason. If you need more information, I > will be happy to provide, but right now I am stuck with the above error, = and > not sure is it a HIVE bug, or I did something wrong. > > Thanks > > Yong > > ________________________________ > From: java8964@hotmail.com > To: user@hive.apache.org > Subject: Array index support non-constant expresssion > Date: Tue, 11 Dec 2012 17:24:52 -0500 > > Hi, In our project to use the HIVE on CDH3U4 release (Hive 0.7.1), I have= a > hive table like the following: > > Table foo ( > search_results array> > search_clicks array> > ) > > As you can see, the 2nd column, which represents the list of search resul= ts > clicked, contains the index location of which results was clicked (starti= ng > from 0). Now I need to find out the click count of some IDs, but this IDs > only exists in the search_result struct. > > When I tried to write a query for this, I will try to do following first: > > > select search_results[c.index_loc] from foo lateral view > explode(search_clicks) clickTable as c; > > But it won't work in hive, as the following error message returned: > AILED: Error in semantic analysis: Line 1:7 Non-constant expressions for > array indexes not supported rank. > > It looks like that currently Hive (at least 0.7) does NOT support > non-constant expressions as the array index. > > I searched on google, and found out the following HIVE jira ticket: > > https://issues.apache.org/jira/browse/HIVE-1955 > > It looks like someone has the same request to support it, but not yet. > > But there is a comment in the above ticket that it can be done in an UDF.= My > question is, can anyone share some ideas about how to archive this in an > UDF, as it maybe the only option for me right now? > > Thanks