hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Yoon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2480) [Hbase Shell] Log Analysis Examples
Date Sun, 30 Dec 2007 11:44:43 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554993
] 

Edward Yoon commented on HADOOP-2480:
-------------------------------------

{code}
hql > jar ./build/contrib/hbase/hadoop-0.16.0-dev-hbase-examples.jar logfetcher udanax/logs
access_log;
1. access_log table createing.

   hql > CREATE TABLE access_log ('http', 'url' MAX_LENGTH:10000); _
   Please wait ... creating
   1 table was created (0.1 sec)

   hql > SHOW TABLES;
   ...

2. access_log files fetching using map/reduce

07/12/27 09:40:45 INFO mapred.FileInputFormat: Total input paths to process : 1
07/12/27 09:40:45 INFO mapred.JobClient: Running job: job_200712270938_0001
07/12/27 09:40:46 INFO mapred.JobClient:  map 0% reduce 0%
07/12/27 09:41:17 INFO mapred.JobClient:  map 1% reduce 0%

   hql > SELECT url FROM access_log LIMIT=10;
   ...

3. ...
{code}

> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
>                 Key: HADOOP-2480
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2480
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>         Environment: All
>            Reporter: Edward Yoon
>            Assignee: Edward Yoon
>            Priority: Trivial
>             Fix For: 0.16.0
>
>         Attachments: v01.patch, v02.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce
on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication
is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically
in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1
in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page)
from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client
making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency or a incoming value from specified
web.
> * ... etc.
> {code}
> ip <row>    http                            url               
> -------------------------------------------------------------------
> ip          http:agent     <agent>          url:URL   <referrer>
>             http:protocol  <protocol>       ...
>             http:method    <method>         
>             http:code      <response code>
>             http:bytesize  <bytesize>           
> {code}
> *Log models and Applications*
> * Next Page Recommendation
> * Page Network Analysis

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message