Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2DA90200B99 for ; Wed, 21 Sep 2016 06:58:11 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 2C270160AC9; Wed, 21 Sep 2016 04:58:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C9498160AC5 for ; Wed, 21 Sep 2016 06:58:09 +0200 (CEST) Received: (qmail 3925 invoked by uid 500); 21 Sep 2016 04:58:08 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 3910 invoked by uid 99); 21 Sep 2016 04:58:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Sep 2016 04:58:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 110151A7312 for ; Wed, 21 Sep 2016 04:58:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.4 X-Spam-Level: ** X-Spam-Status: No, score=2.4 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_LINEPADDING=1.2, NORMAL_HTTP_TO_IP=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Nxospp_utKaf for ; Wed, 21 Sep 2016 04:58:05 +0000 (UTC) Received: from mail-oi0-f44.google.com (mail-oi0-f44.google.com [209.85.218.44]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 8CBDD5F484 for ; Wed, 21 Sep 2016 04:58:05 +0000 (UTC) Received: by mail-oi0-f44.google.com with SMTP id a62so47440743oib.1 for ; Tue, 20 Sep 2016 21:58:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=ssYLUG0x74rFJjj0k+L/+h3gXJh4zzunOZf19qhS+gw=; b=aUEleEp8JUzhRxYX3NnnvW7l1I6+qIngGs81wQg+chuiDhVjTKf29AROtp3UXhkoLE vZQT79rICkpyclVB7VILYfnv+RPC9nXs59i42b4Hjo4cNw699/r6vcNd0MtJ+YMJJZp0 LyFedG+NRkVpjVnar4uJlk7wOwhSRa7I9TXZTdjvId1Q19OT5T7Q/vCZHWIwyZlmCOCB M6I5BJjjiLodqkhHr26ZMU+CvjHfJRwYqHvvm8oQIsYTQeOT+B8KCvh7W1CoOHLdvhXa VNSLji8Mj2Ybcw3p6fv8OvrssfxvPf4d0Ahhq3jPdVeoc3epJgDDYp+f0aN1/SwxKtj9 +Bww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=ssYLUG0x74rFJjj0k+L/+h3gXJh4zzunOZf19qhS+gw=; b=elboKsWiolXF0rFRY7pVpv47x1eVqVGBzBrNLCrfYx+28wP5B+aa/ynAhZvPYOO0QP jS7d4mR/zdqPcwPPU4GpOdhdZZTFHePTAsqewD2ZLwyAu8S19Cx/N7A3G9ss67+2SQ3X vf3Z5tfZRIGoEeLXj5Gaj0oLd7angvrQbyeGR8kIPjlVtLrJWiIrjFQI/i9ys0ONFdOF r9yhdprjxy7FveoD7HN6btdBgaabjpLK2sIPtfU9NsQ8x2VOS6hghBP6nE9/52CHIL5+ AGNdM/pf5mnSFFaEkwUWXE1GRRhR5gpRU7RK+IsVRrHRPpVFTFDH9o5iT8EPSo1V4Xqf qgOg== X-Gm-Message-State: AE9vXwO4F3h4RDmkgp4dIHowbVcnNxLwStb2UJ3uUhb8Fl/V075idx8SVcaRjSUpxVsQSnFqT1IAbRvEoYD+pA== X-Received: by 10.202.63.67 with SMTP id m64mr45832458oia.30.1474433878332; Tue, 20 Sep 2016 21:57:58 -0700 (PDT) MIME-Version: 1.0 Received: by 10.202.188.84 with HTTP; Tue, 20 Sep 2016 21:57:57 -0700 (PDT) In-Reply-To: References: From: Manish Rangari Date: Wed, 21 Sep 2016 10:27:57 +0530 Message-ID: Subject: Re: ELB Log processing To: user@hive.apache.org Content-Type: multipart/alternative; boundary=001a113ddb5c901afc053cfd639a archived-at: Wed, 21 Sep 2016 04:58:11 -0000 --001a113ddb5c901afc053cfd639a Content-Type: text/plain; charset=UTF-8 Thanks Dudu, both the queries worked like a charm. I personally liked second query as it is quite easy to remember. --Manish On Tue, Sep 20, 2016 at 8:41 PM, Markovitz, Dudu wrote: > Or > > > > create view elb_raw_log_detailed > > as > > select request_date, elbname, requestip, requestport, backendip, > backendport, requestprocessingtime, backendprocessingtime, > clientresponsetime, elbresponsecode, backendresponsecode, receivedbytes, > sentbytes, requestverb, url, parse_url(url, 'QUERY','aid') as aid, > parse_url(url, 'QUERY','tid') as tid, parse_url(url, 'QUERY','eid') as eid, > parse_url(url, 'QUERY','did') as did, protocol, useragent, ssl_cipher, > ssl_protocol > > from elblog; > > > > Dudu > > > > *From:* Markovitz, Dudu [mailto:dmarkovitz@paypal.com] > *Sent:* Tuesday, September 20, 2016 6:06 PM > *To:* user@hive.apache.org > *Subject:* RE: ELB Log processing > > > > create view elb_raw_log_detailed > > as > > select request_date, elbname, requestip, requestport, backendip, > backendport, requestprocessingtime, backendprocessingtime, > clientresponsetime, elbresponsecode, backendresponsecode, receivedbytes, > sentbytes, requestverb, url, u.aid, u.tid, u.eid,u.did, protocol, > useragent, ssl_cipher, ssl_protocol > > from elblog > > LATERAL VIEW parse_url_tuple(url,'QUERY:eid','QUERY:tid','QUERY:aid','QUERY:did') > u as eid,tid,aid,did > > ; > > > > Dudu > > > > *From:* Manish Rangari [mailto:linuxtricksfordevops@gmail.com > ] > *Sent:* Tuesday, September 20, 2016 4:09 PM > *To:* user@hive.apache.org > *Subject:* Re: ELB Log processing > > > > Guys, > > > > I am struggling to create this view. I am keep getting the error in bold. > I found that I need to use lateral view but still I am not able to get the > syntax right. > > > > hive> create view elb_raw_log_detailed as select request_date, elbname, > requestip, requestport, backendip, backendport, requestprocessingtime, > backendprocessingtime, clientresponsetime, elbresponsecode, > backendresponsecode, receivedbytes, sentbytes, requestverb, url, > parse_url_tuple(url, 'QUERY:aid') as aid, parse_url_tuple(url, 'QUERY:tid') > as tid, parse_url_tuple(url, 'QUERY:eid') as eid, parse_url_tuple(url, > 'QUERY:did') as did, protocol, useragent, ssl_cipher, ssl_protocol from > elblogz; > > > > *FAILED: SemanticException [Error 10081]: UDTF's are not supported outside > the SELECT clause, nor nested in expressions* > > > > On Tue, Sep 20, 2016 at 3:56 PM, Manish Rangari < > linuxtricksfordevops@gmail.com> wrote: > > Yes views looks like a way to go > > > > On Tue, Sep 20, 2016 at 3:49 PM, Damien Carol > wrote: > > The royal way to do that is a view IMHO. > > > > 2016-09-20 12:14 GMT+02:00 Manish Rangari >: > > Thanks for the reply Damien. The suggestion you gave is really useful. > Currently I am achieving my desired output by performing below steps. But I > want to achieve the desired result in one step instead of two. Do we have > any way so that I can get the aid, did etc in create table statement? If > not I will have to look for the option that you mentioned > > > > 1. > > CREATE TABLE elblog ( > > Request_date STRING, > > ELBName STRING, > > RequestIP STRING, > > RequestPort INT, > > BackendIP STRING, > > BackendPort INT, > > RequestProcessingTime DOUBLE, > > BackendProcessingTime DOUBLE, > > ClientResponseTime DOUBLE, > > ELBResponseCode STRING, > > BackendResponseCode STRING, > > ReceivedBytes BIGINT, > > SentBytes BIGINT, > > RequestVerb STRING, > > URL STRING, > > Protocol STRING, > > Useragent STRING, > > ssl_cipher STRING, > > ssl_protocol STRING > > ) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' > > WITH SERDEPROPERTIES ( > > "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ > ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*) > ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$" > > ) > > STORED AS TEXTFILE; > > > > 2. > > create table elb_raw_log as select request_date, elbname, requestip, > requestport, backendip, backendport, requestprocessingtime, > backendprocessingtime, clientresponsetime, elbresponsecode, > backendresponsecode, receivedbytes, sentbytes, requestverb, url, > regexp_extract(url, '.*aid=([a-zA-Z0-9]+).*', 1) as aid, > regexp_extract(url, '.*tid=([a-zA-Z0-9]+).*', 1) as tid, > regexp_extract(url, '.*eid=([a-zA-Z0-9]+).*', 1) as eid, > regexp_extract(url, '.*did=([a-zA-Z0-9]+).*', 1) as did, protocol, > useragent, ssl_cipher, ssl_protocol from elblog; > > > > On Tue, Sep 20, 2016 at 3:12 PM, Damien Carol > wrote: > > see the udf *parse_url_tuple* > > SELECT b.* > > FROM src LATERAL VIEW parse_url_tuple(fullurl, 'HOST', 'PATH', 'QUERY', > 'QUERY:id') b as host, path, query, query_id LIMIT 1; > > > > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF# > LanguageManualUDF-parse_url_tuple > > > > 2016-09-20 11:22 GMT+02:00 Manish Rangari >: > > Guys, > > > > I want to get the field of elb logs. A sample elb log is given below and I > am using below create table definition. It is working fine. I am getting > what I wanted but now I want the bold part as well. For example eid, tid, > aid. Can anyone help me how can I match them as well. > > > > NOTE: The position of aid, eid, tid is not fixed and it may change. > > > > 2016-09-16T06:55:19.056871Z testelb 2.1.7.2:52399 192.168.1.5:80 0.000021 > 0.000596 0.00002 200 200 0 43 "GET https://site1.example.com:443/peek? > *eid=aw123&tid=fskc235n&aid=2ADSFGSDG* HTTP/1.1" "Mozilla/5.0 (Windows NT > 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 > Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 > > > > > > CREATE TABLE elblog ( > > Request_date STRING, > > ELBName STRING, > > RequestIP STRING, > > RequestPort INT, > > BackendIP STRING, > > BackendPort INT, > > RequestProcessingTime DOUBLE, > > BackendProcessingTime DOUBLE, > > ClientResponseTime DOUBLE, > > ELBResponseCode STRING, > > BackendResponseCode STRING, > > ReceivedBytes BIGINT, > > SentBytes BIGINT, > > RequestVerb STRING, > > URL STRING, > > Protocol STRING, > > Useragent STRING, > > ssl_cipher STRING, > > ssl_protocol STRING > > ) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' > > WITH SERDEPROPERTIES ( > > "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ > ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*) > ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$" > > ) > > STORED AS TEXTFILE; > > > > > > > > > > > --001a113ddb5c901afc053cfd639a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thanks Dudu, both the queries worked like a charm. I perso= nally liked second query as it is quite easy to remember.

--Manish=C2=A0

On Tue, Sep 20, 2016 at 8:41 PM, Markovitz, Dudu <dmarkovit= z@paypal.com> wrote:

Or

=C2=A0

create view elb_raw_log_detailed

as

select request_date, elbname, requestip, req= uestport, backendip, backendport, requestprocessingtime, backendprocessingt= ime, clientresponsetime, elbresponsecode, backendresponsecode, receivedbytes, sentbytes, requestverb, url, parse_url(url, 'QUERY'= ,'aid') as aid, parse_url(url, 'QUERY','tid') as ti= d, parse_url(url, 'QUERY','eid') as eid, parse_url(url, = 9;QUERY','did') as did, protocol, useragent, ssl_cipher, ssl_pr= otocol

from elblog;

=C2=A0

Dudu

=C2=A0

=C2=A0

create view elb_raw_log_detailed

as

select request_date, elbname, requestip, req= uestport, backendip, backendport, requestprocessingtime, backendprocessingt= ime, clientresponsetime, elbresponsecode, backendresponsecode, receivedbytes, sentbytes, requestverb, url, u.aid, u.tid, u.eid,u.did, pro= tocol, useragent, ssl_cipher, ssl_protocol

from elblog

LATERAL VIEW parse_url_tuple(url,'QUERY:= eid','QUERY:tid','QUERY:aid','QUERY:did&#= 39;) u as eid,tid,aid,did

;

=C2=A0

Dudu

=C2=A0

From: Manish Rangari [mailto:linuxtricksfor= devops@gmail.com]
Sent: Tuesday, September 20, 2016 4:09 PM
To: user@h= ive.apache.org
Subject: Re: ELB Log processing

=C2=A0

Guys,

=C2=A0

I am struggling to create this view. I am keep getti= ng the error in bold. I found that I need to use lateral view but still I a= m not able to get the syntax right.

=C2=A0

hive> create view elb_raw_log_detailed as select = request_date, elbname, requestip, requestport, backendip, backendport, requ= estprocessingtime, backendprocessingtime, clientresponsetime, elbresponseco= de, backendresponsecode, receivedbytes, sentbytes, requestverb, url, parse_url_tuple(url, 'QUERY:aid') as = aid, parse_url_tuple(url, 'QUERY:tid') as tid, parse_url_tuple(url,= 'QUERY:eid') as eid, parse_url_tuple(url, 'QUERY:did') as = did, protocol, useragent, ssl_cipher, ssl_protocol from elblogz;<= /u>

=C2=A0

FAILED: SemanticException [Error 10081]: UDTF'= ;s are not supported outside the SELECT clause, nor nested in expressions

=C2=A0

On Tue, Sep 20, 2016 at 3:56 PM, Manish Rangari <= linuxtr= icksfordevops@gmail.com> wrote:

Yes views looks like a way to go

=C2=A0

On Tue, Sep 20, 2016 at 3:49 PM, Damien Carol <damien.carol@gmai= l.com> wrote:

The royal way to do that is a view IMHO.

=C2=A0

2016-09-20 12:14 GMT+02:00 Manish Rangari <linuxtricksfo= rdevops@gmail.com>:

Thanks for the reply Damien. The suggestion you gave= is really useful. Currently I am achieving my desired output by performing= below steps. But I want to achieve the desired result in one step instead = of two. Do we have any way so that I can get the aid, did etc in create table statement? If not I will have t= o look for the option that you mentioned

=C2=A0

1.=C2=A0

CREATE TABLE elblog = (

Request_date STRING,=

=C2=A0 =C2=A0 =C2=A0= ELBName STRING,

=C2=A0 =C2=A0 =C2=A0= RequestIP STRING,

=C2=A0 =C2=A0 =C2=A0= RequestPort INT,

=C2=A0 =C2=A0 =C2=A0= BackendIP STRING,

=C2=A0 =C2=A0 =C2=A0= BackendPort INT,

=C2=A0 =C2=A0 =C2=A0= RequestProcessingTime DOUBLE,

=C2=A0 =C2=A0 =C2=A0= BackendProcessingTime DOUBLE,

=C2=A0 =C2=A0 =C2=A0= ClientResponseTime DOUBLE,

=C2=A0 =C2=A0 =C2=A0= ELBResponseCode STRING,

=C2=A0 =C2=A0 =C2=A0= BackendResponseCode STRING,

=C2=A0 =C2=A0 =C2=A0= ReceivedBytes BIGINT,

=C2=A0 =C2=A0 =C2=A0= SentBytes BIGINT,

=C2=A0 =C2=A0 =C2=A0= RequestVerb STRING,

=C2=A0 =C2=A0 =C2=A0= URL STRING,

=C2=A0 =C2=A0 =C2=A0= Protocol STRING,

Useragent STRING,=

ssl_cipher STRING,

ssl_protocol STRING<= u>

)

ROW FORMAT SERDE = 9;org.apache.hadoop.hive.serde2.RegexSerDe'

WITH SERDEPROPERTIES= (

=C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 "input.regex" =3D "([^ ]*) ([^ ]*) ([^ ]*):([= 0-9]*) ([^ ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*)= ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\&= quot; (.*) (.*)$"

)

STORED AS TEXTFILE;<= u>

=C2=A0

2.

create table elb_raw_log as select request_date, elb= name, requestip, requestport, backendip, backendport, requestprocessingtime= , backendprocessingtime, clientresponsetime, elbresponsecode, backendrespon= secode, receivedbytes, sentbytes, requestverb, url, regexp_extract(url, '.*aid=3D([a-zA-Z0-9]+).*', = 1) as aid, regexp_extract(url, '.*tid=3D([a-zA-Z0-9]+).*', 1) as ti= d, regexp_extract(url, '.*eid=3D([a-zA-Z0-9]+).*', 1) as eid, regex= p_extract(url, '.*did=3D([a-zA-Z0-9]+).*', 1) as did, protocol, useragent, ssl_cipher, ssl_protocol from elblog;=C2=A0

=C2=A0

On Tue, Sep 20, 2016 at 3:12 PM, Damien Carol <damien.carol@gmai= l.com> wrote:

see the udf = parse_url_tuple

SELECT b.*

FROM src LATE= RAL VIEW parse_url_tuple(fullurl, 'HOST', 'PATH', 'QUER= Y', 'QUERY:id') b as host, path, query, query_id LIMIT 1;



https://cwiki.apac= he.org/confluence/display/Hive/LanguageManual+UDF#LanguageMa= nualUDF-parse_url_tuple

=C2=A0

2016-09-20 11:22 GMT+02:00 Manish Rangari <linuxtricksfo= rdevops@gmail.com>:

Guys,=C2=A0

=C2=A0

I want to get the field of elb logs. A sample elb lo= g is given below and I am using below create table definition. It is workin= g fine. I am getting what I wanted but now I want the bold part as well. Fo= r example eid, tid, aid. Can anyone help me how can I match them as well.=C2=A0

=C2=A0

NOTE: The position of aid, eid, tid is not fixed and= it may change.=C2=A0

=C2=A0

2016-09-16T06:55:19.056871Z testelb 2.1.7.2:52399 192.1= 68.1.5:80 0.000021 0.000596 0.00002 200 200 0 43 "GET https://si= te1.example.com:443/peek?eid=3Daw123&tid=3Dfskc235n&aid=3D2ADSFGSDG HTTP/1.1" "Mozilla/5.0 (Windows NT 6.1) App= leWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2

=C2=A0

=C2=A0

CREATE TABLE elblog (

Request_date STRING,

=C2=A0 =C2=A0 =C2=A0 ELBName STRING,

=C2=A0 =C2=A0 =C2=A0 RequestIP STRING,=

=C2=A0 =C2=A0 =C2=A0 RequestPort INT,<= /p>

=C2=A0 =C2=A0 =C2=A0 BackendIP STRING,=

=C2=A0 =C2=A0 =C2=A0 BackendPort INT,<= /p>

=C2=A0 =C2=A0 =C2=A0 RequestProcessingTime DOUBLE,

=C2=A0 =C2=A0 =C2=A0 BackendProcessingTime DOUBLE,

=C2=A0 =C2=A0 =C2=A0 ClientResponseTime DOUBLE,

=C2=A0 =C2=A0 =C2=A0 ELBResponseCode STRING,<= u>

=C2=A0 =C2=A0 =C2=A0 BackendResponseCode STRING,<= /u>

=C2=A0 =C2=A0 =C2=A0 ReceivedBytes BIGINT,=

=C2=A0 =C2=A0 =C2=A0 SentBytes BIGINT,=

=C2=A0 =C2=A0 =C2=A0 RequestVerb STRING,

=C2=A0 =C2=A0 =C2=A0 URL STRING,

=C2=A0 =C2=A0 =C2=A0 Protocol STRING,<= /p>

Useragent STRING,

ssl_cipher STRING,

ssl_protocol STRING

)

ROW FORMAT SERDE 'org.apache.hadoop.hive.se= rde2.RegexSerDe'

WITH SERDEPROPERTIES (

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "input.regex= " =3D "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*):([0-9]*) ([.0-9]*= ) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*) ([-0-9]*) \"([^ = ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"=

)

STORED AS TEXTFILE;

=C2=A0

=C2=A0

=C2=A0

=C2=A0

=C2=A0


--001a113ddb5c901afc053cfd639a--