hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Martyniak <j...@beforedawnsolutions.com>
Subject Re: Web Analytics Use case?
Date Tue, 03 Nov 2009 14:09:20 GMT
Benjamin,

That is kind of the exact case for Hadoop.

Hadoop is a system that is built for handling very large datasets, and  
delivering processed results.  HBase is built for AdHoc data, so  
instead of having complicated table joins etc, you have very large  
rows (multiple columns) with aggregate data, then use HBase to return  
results from that.

We currently use hadoop/hbase to collect and process lots of data,  
then take the results from the processing to populate a SOLR Index,  
and a MySQL database which is then used to feed the front ends.  It  
seems to work pretty good in that it greatly reduces the number of  
rows and the size of the queries in the DB/index.

We are exploring using HBase to feed the front-ends in place of the  
MySQL DBs, so far the jury is out on the performance but it does look  
promising.

-John



On Nov 3, 2009, at 8:28 AM, Benjamin Dageroth wrote:

> Hi,
>
> I am currently evalutating whether Hadoop might be an alternative to  
> our current system. We are providing a web analytics solution for  
> very large websites and run every analysis on all collected data -  
> we do not aggregate the data. This results in very large amounts of  
> data that are processed for each query and currently we are using an  
> in memory database by Exasol with really a lot of RAM, so that it  
> does not take longer than a few seconds and for more complicated  
> queries not longer than a minute to deliever the results.
>
> The solution however is quite expensive and given the growth of data  
> I'd like to explore alternatives. I have read about NoSQL Datastores  
> and about Hadoop, but I am not sure whether it is actually a choice  
> for our web analytics solution. We are collecting data via a  
> trackingpixel which gives data to a trackingserver which writes it  
> to disk once the session of a visitor is done. Our current solution  
> has a large number of tables and the queries running the data can be  
> quite complex:
>
> How many user who came over that keyword and were from that city did  
> actually buy the advertised product? Of these users, what other  
> pages did they look at. Etc.
>
> Would this be a good case for Hbase, Hadoop, Map/Reduce and perhaps  
> Mahout?
>
> Thanks for any thoughts,
> Benjamin
>
> _______________________________________
> Benjamin Dageroth, Business Development Manager
> Webtrekk GmbH
> Boxhagener Str. 76-78, 10245 Berlin
> fon 030 - 755 415 - 360
> fax 030 - 755 415 - 100
> benjamin.dageroth@webtrekk.com
> http://www.webtrekk.com<http://www.webtrekk.de/>
> Amtsgericht Berlin, HRB 93435 B
> Geschäftsführer Christian Sauer
>
>
> _______________________________________
>
>


Mime
View raw message