hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manish Rangari <linuxtricksfordev...@gmail.com>
Subject Re: ELB Log processing
Date Tue, 20 Sep 2016 10:26:25 GMT
Yes views looks like a way to go

On Tue, Sep 20, 2016 at 3:49 PM, Damien Carol <damien.carol@gmail.com>
wrote:

> The royal way to do that is a view IMHO.
>
> 2016-09-20 12:14 GMT+02:00 Manish Rangari <linuxtricksfordevops@gmail.com>
> :
>
>> Thanks for the reply Damien. The suggestion you gave is really useful.
>> Currently I am achieving my desired output by performing below steps. But I
>> want to achieve the desired result in one step instead of two. Do we have
>> any way so that I can get the aid, did etc in create table statement? If
>> not I will have to look for the option that you mentioned
>>
>> 1.
>> CREATE TABLE elblog (
>> Request_date STRING,
>>       ELBName STRING,
>>       RequestIP STRING,
>>       RequestPort INT,
>>       BackendIP STRING,
>>       BackendPort INT,
>>       RequestProcessingTime DOUBLE,
>>       BackendProcessingTime DOUBLE,
>>       ClientResponseTime DOUBLE,
>>       ELBResponseCode STRING,
>>       BackendResponseCode STRING,
>>       ReceivedBytes BIGINT,
>>       SentBytes BIGINT,
>>       RequestVerb STRING,
>>       URL STRING,
>>       Protocol STRING,
>> Useragent STRING,
>> ssl_cipher STRING,
>> ssl_protocol STRING
>> )
>> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
>> WITH SERDEPROPERTIES (
>>           "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^
>> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*)
>> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
>> )
>> STORED AS TEXTFILE;
>>
>> 2.
>> create table elb_raw_log as select request_date, elbname, requestip,
>> requestport, backendip, backendport, requestprocessingtime,
>> backendprocessingtime, clientresponsetime, elbresponsecode,
>> backendresponsecode, receivedbytes, sentbytes, requestverb, url,
>> regexp_extract(url, '.*aid=([a-zA-Z0-9]+).*', 1) as aid,
>> regexp_extract(url, '.*tid=([a-zA-Z0-9]+).*', 1) as tid,
>> regexp_extract(url, '.*eid=([a-zA-Z0-9]+).*', 1) as eid,
>> regexp_extract(url, '.*did=([a-zA-Z0-9]+).*', 1) as did, protocol,
>> useragent, ssl_cipher, ssl_protocol from elblog;
>>
>> On Tue, Sep 20, 2016 at 3:12 PM, Damien Carol <damien.carol@gmail.com>
>> wrote:
>>
>>> see the udf
>>> *parse_url_tuple*
>>> SELECT b.*
>>> FROM src LATERAL VIEW parse_url_tuple(fullurl, 'HOST', 'PATH', 'QUERY',
>>> 'QUERY:id') b as host, path, query, query_id LIMIT 1;
>>>
>>>
>>> https://cwiki.apache.org/confluence/display/Hive/LanguageMan
>>> ual+UDF#LanguageManualUDF-parse_url_tuple
>>>
>>> 2016-09-20 11:22 GMT+02:00 Manish Rangari <linuxtricksfordevops@gmail.co
>>> m>:
>>>
>>>> Guys,
>>>>
>>>> I want to get the field of elb logs. A sample elb log is given below
>>>> and I am using below create table definition. It is working fine. I am
>>>> getting what I wanted but now I want the bold part as well. For example
>>>> eid, tid, aid. Can anyone help me how can I match them as well.
>>>>
>>>> NOTE: The position of aid, eid, tid is not fixed and it may change.
>>>>
>>>> 2016-09-16T06:55:19.056871Z testelb 2.1.7.2:52399 192.168.1.5:80
>>>> 0.000021 0.000596 0.00002 200 200 0 43 "GET
>>>> https://site1.example.com:443/peek?
>>>> *eid=aw123&tid=fskc235n&aid=2ADSFGSDG* HTTP/1.1" "Mozilla/5.0 (Windows
>>>> NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85
>>>> Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
>>>>
>>>>
>>>> CREATE TABLE elblog (
>>>> Request_date STRING,
>>>>       ELBName STRING,
>>>>       RequestIP STRING,
>>>>       RequestPort INT,
>>>>       BackendIP STRING,
>>>>       BackendPort INT,
>>>>       RequestProcessingTime DOUBLE,
>>>>       BackendProcessingTime DOUBLE,
>>>>       ClientResponseTime DOUBLE,
>>>>       ELBResponseCode STRING,
>>>>       BackendResponseCode STRING,
>>>>       ReceivedBytes BIGINT,
>>>>       SentBytes BIGINT,
>>>>       RequestVerb STRING,
>>>>       URL STRING,
>>>>       Protocol STRING,
>>>> Useragent STRING,
>>>> ssl_cipher STRING,
>>>> ssl_protocol STRING
>>>> )
>>>> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
>>>> WITH SERDEPROPERTIES (
>>>>           "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^
>>>> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*)
>>>> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
>>>> )
>>>> STORED AS TEXTFILE;
>>>>
>>>
>>>
>>
>

Mime
View raw message