hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Yoon (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-38) Log Analysis Examples
Date Tue, 05 Feb 2008 08:05:10 GMT

     [ https://issues.apache.org/jira/browse/HBASE-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Edward Yoon updated HBASE-38:

    Summary: Log Analysis Examples  (was: [Hbase Shell] Log Analysis Examples)

> Log Analysis Examples
> ---------------------
>                 Key: HBASE-38
>                 URL: https://issues.apache.org/jira/browse/HBASE-38
>             Project: Hadoop HBase
>          Issue Type: New Feature
>         Environment: All
>            Reporter: Edward Yoon
>            Priority: Trivial
>         Attachments: v01.patch, v02.patch
> I made an apache log fetcher, log analyzer, social network analyzer using map/reduce
on hbase table for large scale .
> * 5 Terra Bytes Logs will be used. 
> * You can see at here : http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> ||IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide this information|
> |-|User name with which the client was authenticated; typically always blank unless authentication
is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client. Typically
in the form of method (GET in this example), resource (the URL requested), and protocol (HTTP/1.1
in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within the page)
from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of the client
making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix. 
> ** cell can be a numeric value of document visit frequency or a incoming value from specified
> * ... etc.
> {code}
> ip <row>    http                            url               
> -------------------------------------------------------------------
> ip          http:agent     <agent>          url:URL   <referrer>
>             http:protocol  <protocol>       ...
>             http:method    <method>         
>             http:code      <response code>
>             http:bytesize  <bytesize>           
> {code}
> *Log models and Applications*
> * Next Page Recommendation
> * Page Network Analysis

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message