Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0BC3317860 for ; Wed, 1 Oct 2014 18:05:47 +0000 (UTC) Received: (qmail 2878 invoked by uid 500); 1 Oct 2014 18:05:38 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 2768 invoked by uid 500); 1 Oct 2014 18:05:38 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 2747 invoked by uid 99); 1 Oct 2014 18:05:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Oct 2014 18:05:38 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yuzhihong@gmail.com designates 209.85.216.175 as permitted sender) Received: from [209.85.216.175] (HELO mail-qc0-f175.google.com) (209.85.216.175) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Oct 2014 18:05:34 +0000 Received: by mail-qc0-f175.google.com with SMTP id w7so759513qcr.34 for ; Wed, 01 Oct 2014 11:05:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=4HtgjE2B8bdDDUwc9qTIB9FVw7/28YsLzr1XpCgJUXA=; b=b/BsQYrNuqAoXHi+nc+2wzp6ZEJvk5uqV2lJNA9rTttkjHtjzr2wGwjJe7LGlatDU3 AJEodM+NPYrCApHzWgXB6htcZVo3sJOd/v9VFT8wQRslpvsSSLsc/ut9AYUWdYrDpSUs eL3gpgk5VXZ432Ho0UvDr2jEiXHBlPZsymZfOCtyT8DfYbJVMo5e+xFXo6RFW6oWBox6 hgfspvZTLeyUVxpdT81zL6ipGcqPuPeqQNgDPDNhuu1cvOrnQ6AQq92QWmgd1RgRiICV WNba0CIa3syf6oIT9m3U8+e6NxQgx4qAcRqe3NBHwYnI0xCtlOa3kmQ1RbfSzKee+Iyo XXRg== MIME-Version: 1.0 X-Received: by 10.236.228.161 with SMTP id f31mr81090465yhq.44.1412186712891; Wed, 01 Oct 2014 11:05:12 -0700 (PDT) Received: by 10.170.163.70 with HTTP; Wed, 1 Oct 2014 11:05:12 -0700 (PDT) In-Reply-To: <542C41CE.7000709@cawoom.com> References: <542C41CE.7000709@cawoom.com> Date: Wed, 1 Oct 2014 11:05:12 -0700 Message-ID: Subject: Re: Planning to propose Hadoop initiative to company. Need some inputs please. From: Ted Yu To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a113335266114cb050460571e X-Virus-Checked: Checked by ClamAV on apache.org --001a113335266114cb050460571e Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Adding hbase user. On Wed, Oct 1, 2014 at 11:02 AM, Wilm Schumacher wrote: > Hi, > > first: I think hbase is what you are looking for. If I understand > correctly you want to show the customer his or her data very fast and > let them manipulate their data. So you need something like a data > warehouse system. Thus, hbase is the method of choice for you (and I > think for your kind of data, hbase is a better choice than cassandra or > mongoDB). But of course you need a running hadoop system to run a hbase. > So it's not an either/or ;) > > (my answers are for hbase, as I think it's what you are looking for. If > you are not interested, just ignore the following text. Sry @all by > writing about hbase on this list ;).) > > Am 01.10.2014 um 17:24 schrieb mani kandan: > > 1) How much web usage data will a typical website like ours collect on = a > > daily basis? (I know I can ask our IT department, but I would like to > > gather some background idea before talking to them.) > well, if you have the option to ask your IT department you should do > that, because everyone here would have to guess. You would have to > explain very detailed what you have to do to let us guess. If you e.g. > want to track the user on what he or she has clicked, perhaps to make > personalized ads, than you have to save more data. So, you should ask > the persons who have the data right away without guessing. > > > 3) How many clusters/nodes would I need to =E2=80=8Brun a web usage ana= lytics > > system? > in the book "hbase in action" there are some recommendations for some > "case studies" (part IV "deploying hbase"). There are some thoughts on > the number of nodes, and how to use them, depending on the size of your > data > > > 4) What are the ways for me to use our data? (One use case I'm thinking > > of is to analyze the error messages log for each page on quote process > > to redesign the UI. Is this possible?) > sure. And this should be very easy. I would pump the error log into a > hbase table. By this method you could read the messages directly from > the hbase shell (if they are few enough). Or you could use hive to query > your log a little more "sql like" and make statistics very easy. > > > 5) How long would it take for me to set up and start such a system? > for a novice who have to do it for the first time: for the stand alone > hbase system perhaps 2 hours. For a complete distributed test cluster > ... perhaps a day. For the real producing system, with all security > features ... a little longer ;). > > > I'm sorry if some/all of these questions are unanswerable. I just want > > to discuss my thoughts, and get an idea of what things can I achieve by > > going the way of Hadoop. > well, I think, but I could err, that you think of hadoop (or hbase) in a > way that you just can change the "database backend" from "SQL" to > "hbase/hadoop" and everything would run right away. This will not be > that easy. You would have to change the code of your web application in > a very fundamental way. You have to rethink all the table designs etc., > so this could be more complicate than you think right know. > > However, hbase/hadoop hase some advantages which are very interesing for > you. Well first, it is distributed, which enables your company to grow > almost limitless, or to collect more data about your customers so you > can get more informations (and sell more stuff). And map reduce is a > wonderful tool for making real fancy "statistics", which is very > interesting for an insurance company. Your mathematical economist will > REALLY love it ;). > > Hope this helped. > > best wishes > > Wilm > > > --001a113335266114cb050460571e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Adding hbase user.

On Wed, Oct 1, 2014 at 11:02 AM, Wilm Schumacher <wilm.schumacher@cawoom.com> wrote:
Hi,

first: I think hbase is what you are looking for. If I understand
correctly you want to show the customer his or her data very fast and
let them manipulate their data. So you need something like a data
warehouse system. Thus, hbase is the method of choice for you (and I
think for your kind of data, hbase is a better choice than cassandra or
mongoDB). But of course you need a running hadoop system to run a hbase. So it's not an either/or ;)

(my answers are for hbase, as I think it's what you are looking for. If=
you are not interested, just ignore the following text. Sry @all by
writing about hbase on this list ;).)

Am 01.10.2014 um 17:24 schrieb mani kandan:
> 1) How much web usage data will a typical website like ours collect on= a
> daily basis? (I know I can ask our IT department, but I would like to<= br> > gather some background idea before talking to them.)
well, if you have the option to ask your IT department you should do
that, because everyone here would have to guess. You would have to
explain very detailed what you have to do to let us guess. If you e.g.
want to track the user on what he or she has clicked, perhaps to make
personalized ads, than you have to save more data. So, you should ask
the persons who have the data right away without guessing.

> 3) How many clusters/nodes would I need to =E2=80=8Brun a web usage an= alytics
> system?
in the book "hbase in action" there are some recommendations for = some
"case studies" (part IV "deploying hbase"). There are s= ome thoughts on
the number of nodes, and how to use them, depending on the size of your dat= a

> 4) What are the ways for me to use our data? (One use case I'm thi= nking
> of is to analyze the error messages log for each page on quote process=
> to redesign the UI. Is this possible?)
sure. And this should be very easy. I would pump the error log into a
hbase table. By this method you could read the messages directly from
the hbase shell (if they are few enough). Or you could use hive to query your log a little more "sql like" and make statistics very easy.<= br>
> 5) How long would it take for me to set up and start such a system? for a novice who have to do it for the first time: for the stand alone
hbase system perhaps 2 hours. For a complete distributed test cluster
... perhaps a day. For the real producing system, with all security
features ... a little longer ;).

> I'm sorry if some/all of these questions are unanswerable. I just = want
> to discuss my thoughts, and get an idea of what things can I achieve b= y
> going the way of Hadoop.
well, I think, but I could err, that you think of hadoop (or hbase) in a way that you just can change the "database backend" from "SQ= L" to
"hbase/hadoop" and everything would run right away. This will not= be
that easy. You would have to change the code of your web application in
a very fundamental way. You have to rethink all the table designs etc.,
so this could be more complicate than you think right know.

However, hbase/hadoop hase some advantages which are very interesing for you. Well first, it is distributed, which enables your company to grow
almost limitless, or to collect more data about your customers so you
can get more informations (and sell more stuff). And map reduce is a
wonderful tool for making real fancy "statistics", which is very<= br> interesting for an insurance company. Your mathematical economist will
REALLY love it ;).

Hope this helped.

best wishes

Wilm



--001a113335266114cb050460571e--