hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From liad livnat <liadliv...@gmail.com>
Subject use the request column in apache access.log as the source of the Hadoop table
Date Tue, 23 Nov 2010 08:49:29 GMT
Hi All

I'm facing a problem and need your help.

*I would like to use the request column in apache access.log as the source
of the Hadoop table.*





I was able to insert the entire log table but, I would like to insert
a *specific
request to specific table* *the question is* : is possible without
additional script? If so, how.

The following example should demonstrate what we are looking for:



1.       Supposed we have the following log file

a.       XXX.16.3.221 - - [22/Nov/2010:23:57:09 -0800] "GET
/includes/Entity1.ent?ClientID=1189272&DayOfWeek=2&Sent=OK&WeekStart=31%2000:00:00
HTTP/1.1" 200 1150 "-" "-"

2.       And the following appropriate table

CREATE TABLE Entity1(

                Id INT,

                DayOfWeek INT,

                Sent STRING,

                WeekStart INT)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'STORED
AS TEXTFILE;

3.       The following query : "select * from Entity1" -  should return :
1189272,2,OK, 31



1.       Did you do something like this before?

2.       Suppose the request string was encapsulate with base64, is there a
way to decode it – do we need to use python script for that?

3.       One last question, can you give as example of your use in python  -
aka what are you use it for?



Thanks in advanced,

Liad.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message