Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 11402100BD for ; Sat, 28 Dec 2013 09:00:21 +0000 (UTC) Received: (qmail 10334 invoked by uid 500); 28 Dec 2013 09:00:14 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 10276 invoked by uid 500); 28 Dec 2013 09:00:12 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 10263 invoked by uid 99); 28 Dec 2013 09:00:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 28 Dec 2013 09:00:11 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of azotov@griddynamics.com designates 209.85.128.171 as permitted sender) Received: from [209.85.128.171] (HELO mail-ve0-f171.google.com) (209.85.128.171) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 28 Dec 2013 09:00:05 +0000 Received: by mail-ve0-f171.google.com with SMTP id pa12so5299328veb.16 for ; Sat, 28 Dec 2013 00:59:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=griddynamics.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=5xdsBjndSkLY6bS9yU7riGm77yFMydsc9ui7PTRVZF4=; b=vytDC/SO92PerKkV1+nuuVsgGQMF9qOCT+ZfOxMZ0i32TM6G3dV8aN1l1IkUonF4NL EnwHR70s5yY9MvmiVIeewiPNl3CGzMt57fTMD8L1ZEWhkBwAk2TvQjBpDLNc14IRtvOK 9lu5XDesNPk9awOWcRk3+bTyfw0wh8b1ZdvhU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=5xdsBjndSkLY6bS9yU7riGm77yFMydsc9ui7PTRVZF4=; b=dyogGc8IrvPLUAvz2LqwwJuGNFdHEase1jCkX9RL6yRn6XtcklXEoy8wU/ZleU4Rcl da9JYsnJ7Zk9JhfLFUEr37XlRbw7fFjpBHJCkE81ilfj8Qqj8QFzT4YF0wZHuMsveRzJ VVqu+ncKfN4mJKAeskrsz5I+Xjrpoh7Om2Tv562pTfJud3WhtGRiwy0MfgnwFRg8x12M FEy5MBFfAHcN2ftikCtqi0sa4W0YKrHv/gX7exfgSdCiKs0Gwbmkly5OJxhOg6x9T3cW nr6lHl27NAJsRSFNQ7EBTnkQBg3S4bo0we3hZ3Clo8zrRnYr8gUm26gHiRPb8kHtu4eG ZjFQ== X-Gm-Message-State: ALoCoQkB1hGXFl2cTfWQjCV8yUlIu7DRVbCByI7wVXGPxpLiYX7JiM6y7pfwy2SsyLW7UWmm2+KI MIME-Version: 1.0 X-Received: by 10.58.155.162 with SMTP id vx2mr8742567veb.46.1388221183746; Sat, 28 Dec 2013 00:59:43 -0800 (PST) Received: by 10.58.55.226 with HTTP; Sat, 28 Dec 2013 00:59:43 -0800 (PST) In-Reply-To: References: Date: Sat, 28 Dec 2013 12:59:43 +0400 Message-ID: Subject: Re: concat_ws() UDF From: Alexey Zotov To: user@hive.apache.org Content-Type: multipart/alternative; boundary=047d7b66f2f78702bd04ee946e56 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b66f2f78702bd04ee946e56 Content-Type: text/plain; charset=ISO-8859-1 Ouhcc.. It was a very stupid question from my side :( Thanks a lot Ritesh! On Sat, Dec 28, 2013 at 12:32 PM, Ritesh Agrawal wrote: > The first string in concat_ws is used as a separator. Since the third > value in the last concat_ws is null, it skipped that and simply > concactenated second and fourth value by **. Hence you got abc****8675309 > > Ritesh > > > On Fri, Dec 27, 2013 at 12:13 AM, Alexey Zotov wrote: > >> Sorry, but in my previous email there was some misleading due to text >> formatting. >> >> So, there are tests for concat_ws() UDF: >> https://github.com/apache/hive/blob/trunk/ql/src/test/results/clientpositive/udf_concat_ws.q.out. >> One of tests (78 line) contains the following query: >> >>> SELECT concat_ws(dest1.c1, dest1.c2, dest1.c3), >>> concat_ws(',', dest1.c1, dest1.c2, dest1.c3), >>> concat_ws(NULL, dest1.c1, dest1.c2, dest1.c3), >>> concat_ws('**', dest1.c1, NULL, dest1.c3) FROM dest1 >> >> and expects the following results: >> >>> xyzabc8675309 abc,xyz,8675309 NULL abc****8675309 >> >> >> I'm confused by the result of the last concat_ws. Why it returns abc**** >> 8675309 instead of abc8675309? It looks like NULL VALUE has not been >> skipped. >> >> Thanks. >> >> >> On Thu, Dec 26, 2013 at 3:36 PM, Alexey Zotov wrote: >> >>> Hello Guys, >>> >>> historically concat_ws() UDF was added in the scope of >>> https://issues.apache.org/jira/browse/HIVE-682 ticket. I have a simple >>> question about its implementation. According to above ticket's description >>> it should have behavior like MySQL implementation ( >>> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_concat-ws). >>> MySQL documentation says: "CONCAT_WS() does >>> not skip empty strings. However, it does skip any NULL values after the >>> separator argument.". I have performed a simple test: >>> >>>> create table test (col1 string); >>>> select concat_ws(',', '10', NULL, '4', '', 'a') from test limit 1; >>> >>> >>> as a result I have got: >>> >>>> 10,4,,a >>> >>> which looks good for me. >>> >>> But in tests ( >>> https://github.com/apache/hive/blob/trunk/ql/src/test/results/clientpositive/udf_concat_ws.q.out : >>> 78 line) I see the following lines: >>> >>> >>>> >>>> >>>> POSTHOOK: query: SELECT concat_ws(dest1.c1, dest1.c2, dest1.c3), >>>> concat_ws(',', dest1.c1, dest1.c2, dest1.c3), >>>> concat_ws(NULL, dest1.c1, dest1.c2, dest1.c3), >>>> concat_ws('**', dest1.c1, *NULL*, dest1.c3) FROM dest1 >>>> >>>> >>>> >>>> POSTHOOK: type: QUERY >>>> POSTHOOK: Input: default@dest1 >>>> #### A masked pattern was here #### >>>> POSTHOOK: Lineage: dest1.c1 SIMPLE [] >>>> POSTHOOK: Lineage: dest1.c2 SIMPLE [] >>>> POSTHOOK: Lineage: dest1.c3 SIMPLE [] >>>> >>>> >>>> >>>> xyzabc8675309 abc,xyz,8675309 NULL abc****8675309 >>> >>> which looks like NULLs are not skipped. >>> >>> What have I missed? >>> >>> Thanks, Alexey. >>> >> >> >> >> -- >> >> Best regards >> >> Zotov Alexey >> Grid Dynamics >> Skype: azotcsit >> > > -- Best regards Zotov Alexey Grid Dynamics Skype: azotcsit --047d7b66f2f78702bd04ee946e56 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Ouhcc.. It was a very stupid question from my side :( Than= ks a lot Ritesh!


On Sat, Dec 28, 2013 at 12:32 PM, Ritesh Agrawal <ragrawa= l@netflix.com> wrote:
The first string in concat_= ws is used as a separator. Since the third value in the last concat_ws is n= ull, it skipped that and simply concactenated second and fourth value by **= . Hence you got=A0=A0abc**8675309<= /span>

Ritesh


On Fri, Dec 27, 2013 at 12:1= 3 AM, Alexey Zotov <azotov@griddynamics.com> wrote:
Sorry, but in my previous e= mail there was some misleading due to text formatting.

S= o, there are tests for concat_ws() UDF:=A0ht= tps://github.com/apache/hive/blob/trunk/ql/src/test/results/clientpositive/= udf_concat_ws.q.out. One of tests (78 line) contains the following quer= y:
SELECT concat_ws(dest1.c1, dest1.c2, dest1.c3),=
=A0=A0=A0=A0=A0=A0=A0concat_ws(',', dest1.c1, des= t1.c2, dest1.c3),
=A0=A0=A0=A0=A0=A0=A0concat_ws(NULL, dest1.c1, dest1.c2, = dest1.c3),
=A0=A0=A0=A0=A0=A0=A0concat_ws('**', dest1.c1, NULL, dest1.c3) FROM dest1
and expects the following results:
xyzabc8675309=A0=A0=A0=A0=A0=A0=A0=A0abc,xyz,8675= 309=A0=A0=A0=A0=A0=A0=A0=A0NULL=A0=A0=A0=A0=A0=A0=A0=A0abc= **8675309

I'm confused by the result of the last concat_ws. W= hy it returns=A0abc**8675309=A0instead of=A0abc8675309? It looks l= ike NULL VALUE has not been skipped.=A0

Thanks.


On Thu, Dec 26, 2013 at 3:36 PM, Alexey= Zotov <azotov@griddynamics.com> wrote:
Hello Guys,
=
historically concat_ws() UDF was added in the scope of=A0h= ttps://issues.apache.org/jira/browse/HIVE-682=A0ticket. I have a simple= question about its implementation. According to above ticket's descrip= tion it should have behavior like MySQL implementation (http://dev.mysql.com/doc/refman/5.0/en/string-functions.htm= l#function_concat-ws). MySQL documentation says: "CONCAT_WS()=A0does not skip empty stri= ngs. However, it does skip any=A0NULL=A0values after the separator argume= nt.". I have performed a simple test:
create table test (col1 string);
select concat_ws= (',', '10', NULL, '4', '', 'a') fro= m test limit 1;

as a result I have got:
10,4,,a
which looks good for me.

But in tests (https://github.com/apache/hive/blob/trunk/ql/src/test/results/clientp= ositive/udf_concat_ws.q.out=A0: 78 line) I see the following lines:
POSTHOOK: query: SELECT concat_ws(dest1.c1, dest1.c2, dest1.c3),
=A0=A0= =A0=A0=A0=A0=A0concat_ws(',', dest1.c1, dest1.c2, dest1.c3),
=A0= =A0=A0=A0=A0=A0=A0concat_ws(NULL, dest1.c1, dest1.c2, dest1.c3),
=A0=A0= =A0=A0=A0=A0=A0concat_ws('**', dest1.c1, NULL, dest1.c3) FRO= M dest1
POSTHOOK: type: QUERY
POSTHOOK: Input: default@dest1
#### A masked pa= ttern was here ####
POSTHOOK: Lineage: dest1.c1 SIMPLE []
POSTHOOK: L= ineage: dest1.c2 SIMPLE []
POSTHOOK: Lineage: dest1.c3 SIMPLE []
xyzabc8675309=A0=A0=A0=A0=A0=A0=A0=A0abc,xyz,8675309=A0=A0=A0=A0=A0=A0=A0= =A0NULL=A0=A0=A0=A0=A0=A0=A0=A0abc**8675309
=
which looks like NULLs are not skipped.

What = have I missed?

Thanks, Alexey.



<= font color=3D"#888888">--

Best regards

Zotov Alexey

Grid Dynamics
Skype: azotcsit




--

Best regards

Zotov Alexey

Grid Dynamics<= /font>
Skype: azotcsit
--047d7b66f2f78702bd04ee946e56--