hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manish Rangari <linuxtricksfordev...@gmail.com>
Subject Re: ELB Log processing
Date Tue, 20 Sep 2016 13:09:20 GMT
Guys,

I am struggling to create this view. I am keep getting the error in bold. I
found that I need to use lateral view but still I am not able to get the
syntax right.

hive> create view elb_raw_log_detailed as select request_date, elbname,
requestip, requestport, backendip, backendport, requestprocessingtime,
backendprocessingtime, clientresponsetime, elbresponsecode,
backendresponsecode, receivedbytes, sentbytes, requestverb, url,
parse_url_tuple(url, 'QUERY:aid') as aid, parse_url_tuple(url, 'QUERY:tid')
as tid, parse_url_tuple(url, 'QUERY:eid') as eid, parse_url_tuple(url,
'QUERY:did') as did, protocol, useragent, ssl_cipher, ssl_protocol from
elblogz;

*FAILED: SemanticException [Error 10081]: UDTF's are not supported outside
the SELECT clause, nor nested in expressions*

On Tue, Sep 20, 2016 at 3:56 PM, Manish Rangari <
linuxtricksfordevops@gmail.com> wrote:

> Yes views looks like a way to go
>
> On Tue, Sep 20, 2016 at 3:49 PM, Damien Carol <damien.carol@gmail.com>
> wrote:
>
>> The royal way to do that is a view IMHO.
>>
>> 2016-09-20 12:14 GMT+02:00 Manish Rangari <linuxtricksfordevops@gmail.com
>> >:
>>
>>> Thanks for the reply Damien. The suggestion you gave is really useful.
>>> Currently I am achieving my desired output by performing below steps. But I
>>> want to achieve the desired result in one step instead of two. Do we have
>>> any way so that I can get the aid, did etc in create table statement? If
>>> not I will have to look for the option that you mentioned
>>>
>>> 1.
>>> CREATE TABLE elblog (
>>> Request_date STRING,
>>>       ELBName STRING,
>>>       RequestIP STRING,
>>>       RequestPort INT,
>>>       BackendIP STRING,
>>>       BackendPort INT,
>>>       RequestProcessingTime DOUBLE,
>>>       BackendProcessingTime DOUBLE,
>>>       ClientResponseTime DOUBLE,
>>>       ELBResponseCode STRING,
>>>       BackendResponseCode STRING,
>>>       ReceivedBytes BIGINT,
>>>       SentBytes BIGINT,
>>>       RequestVerb STRING,
>>>       URL STRING,
>>>       Protocol STRING,
>>> Useragent STRING,
>>> ssl_cipher STRING,
>>> ssl_protocol STRING
>>> )
>>> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
>>> WITH SERDEPROPERTIES (
>>>           "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^
>>> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*)
>>> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
>>> )
>>> STORED AS TEXTFILE;
>>>
>>> 2.
>>> create table elb_raw_log as select request_date, elbname, requestip,
>>> requestport, backendip, backendport, requestprocessingtime,
>>> backendprocessingtime, clientresponsetime, elbresponsecode,
>>> backendresponsecode, receivedbytes, sentbytes, requestverb, url,
>>> regexp_extract(url, '.*aid=([a-zA-Z0-9]+).*', 1) as aid,
>>> regexp_extract(url, '.*tid=([a-zA-Z0-9]+).*', 1) as tid,
>>> regexp_extract(url, '.*eid=([a-zA-Z0-9]+).*', 1) as eid,
>>> regexp_extract(url, '.*did=([a-zA-Z0-9]+).*', 1) as did, protocol,
>>> useragent, ssl_cipher, ssl_protocol from elblog;
>>>
>>> On Tue, Sep 20, 2016 at 3:12 PM, Damien Carol <damien.carol@gmail.com>
>>> wrote:
>>>
>>>> see the udf
>>>> *parse_url_tuple*
>>>> SELECT b.*
>>>> FROM src LATERAL VIEW parse_url_tuple(fullurl, 'HOST', 'PATH', 'QUERY',
>>>> 'QUERY:id') b as host, path, query, query_id LIMIT 1;
>>>>
>>>>
>>>> https://cwiki.apache.org/confluence/display/Hive/LanguageMan
>>>> ual+UDF#LanguageManualUDF-parse_url_tuple
>>>>
>>>> 2016-09-20 11:22 GMT+02:00 Manish Rangari <
>>>> linuxtricksfordevops@gmail.com>:
>>>>
>>>>> Guys,
>>>>>
>>>>> I want to get the field of elb logs. A sample elb log is given below
>>>>> and I am using below create table definition. It is working fine. I am
>>>>> getting what I wanted but now I want the bold part as well. For example
>>>>> eid, tid, aid. Can anyone help me how can I match them as well.
>>>>>
>>>>> NOTE: The position of aid, eid, tid is not fixed and it may change.
>>>>>
>>>>> 2016-09-16T06:55:19.056871Z testelb 2.1.7.2:52399 192.168.1.5:80
>>>>> 0.000021 0.000596 0.00002 200 200 0 43 "GET
>>>>> https://site1.example.com:443/peek?
>>>>> *eid=aw123&tid=fskc235n&aid=2ADSFGSDG* HTTP/1.1" "Mozilla/5.0
>>>>> (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85
>>>>> Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
>>>>>
>>>>>
>>>>> CREATE TABLE elblog (
>>>>> Request_date STRING,
>>>>>       ELBName STRING,
>>>>>       RequestIP STRING,
>>>>>       RequestPort INT,
>>>>>       BackendIP STRING,
>>>>>       BackendPort INT,
>>>>>       RequestProcessingTime DOUBLE,
>>>>>       BackendProcessingTime DOUBLE,
>>>>>       ClientResponseTime DOUBLE,
>>>>>       ELBResponseCode STRING,
>>>>>       BackendResponseCode STRING,
>>>>>       ReceivedBytes BIGINT,
>>>>>       SentBytes BIGINT,
>>>>>       RequestVerb STRING,
>>>>>       URL STRING,
>>>>>       Protocol STRING,
>>>>> Useragent STRING,
>>>>> ssl_cipher STRING,
>>>>> ssl_protocol STRING
>>>>> )
>>>>> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
>>>>> WITH SERDEPROPERTIES (
>>>>>           "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^
>>>>> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*)
>>>>> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
>>>>> )
>>>>> STORED AS TEXTFILE;
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message