Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of yuzhihong@gmail.com
 designates 209.85.216.175 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <542C41CE.7000709@cawoom.com>
References: 
 <CALjx1-8U+tt4ZHKG3Tyh=oDzs6ZMGgrf5YpS9g48-aXoOToqcA@mail.gmail.com>
	<542C41CE.7000709@cawoom.com>
Date: Wed, 1 Oct 2014 11:05:12 -0700
Message-ID: 
 <CALte62xdNNk6az9oRqcwggZGeYxHfFn3VYqtfWuy5syG3dXH1g@mail.gmail.com>
Subject: Re: Planning to propose Hadoop initiative to company. Need some
 inputs please.
From: Ted Yu <yuzhihong@gmail.com>
To: "user@hbase.apache.org" <user@hbase.apache.org>
Content-Type: multipart/alternative; boundary=001a113335266114cb050460571e

--001a113335266114cb050460571e
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Adding hbase user.

On Wed, Oct 1, 2014 at 11:02 AM, Wilm Schumacher <wilm.schumacher@cawoom.co=
m
> wrote:

> Hi,
>
> first: I think hbase is what you are looking for. If I understand
> correctly you want to show the customer his or her data very fast and
> let them manipulate their data. So you need something like a data
> warehouse system. Thus, hbase is the method of choice for you (and I
> think for your kind of data, hbase is a better choice than cassandra or
> mongoDB). But of course you need a running hadoop system to run a hbase.
> So it's not an either/or ;)
>
> (my answers are for hbase, as I think it's what you are looking for. If
> you are not interested, just ignore the following text. Sry @all by
> writing about hbase on this list ;).)
>
> Am 01.10.2014 um 17:24 schrieb mani kandan:
> > 1) How much web usage data will a typical website like ours collect on =
a
> > daily basis? (I know I can ask our IT department, but I would like to
> > gather some background idea before talking to them.)
> well, if you have the option to ask your IT department you should do
> that, because everyone here would have to guess. You would have to
> explain very detailed what you have to do to let us guess. If you e.g.
> want to track the user on what he or she has clicked, perhaps to make
> personalized ads, than you have to save more data. So, you should ask
> the persons who have the data right away without guessing.
>
> > 3) How many clusters/nodes would I need to =E2=80=8Brun a web usage ana=
lytics
> > system?
> in the book "hbase in action" there are some recommendations for some
> "case studies" (part IV "deploying hbase"). There are some thoughts on
> the number of nodes, and how to use them, depending on the size of your
> data
>
> > 4) What are the ways for me to use our data? (One use case I'm thinking
> > of is to analyze the error messages log for each page on quote process
> > to redesign the UI. Is this possible?)
> sure. And this should be very easy. I would pump the error log into a
> hbase table. By this method you could read the messages directly from
> the hbase shell (if they are few enough). Or you could use hive to query
> your log a little more "sql like" and make statistics very easy.
>
> > 5) How long would it take for me to set up and start such a system?
> for a novice who have to do it for the first time: for the stand alone
> hbase system perhaps 2 hours. For a complete distributed test cluster
> ... perhaps a day. For the real producing system, with all security
> features ... a little longer ;).
>
> > I'm sorry if some/all of these questions are unanswerable. I just want
> > to discuss my thoughts, and get an idea of what things can I achieve by
> > going the way of Hadoop.
> well, I think, but I could err, that you think of hadoop (or hbase) in a
> way that you just can change the "database backend" from "SQL" to
> "hbase/hadoop" and everything would run right away. This will not be
> that easy. You would have to change the code of your web application in
> a very fundamental way. You have to rethink all the table designs etc.,
> so this could be more complicate than you think right know.
>
> However, hbase/hadoop hase some advantages which are very interesing for
> you. Well first, it is distributed, which enables your company to grow
> almost limitless, or to collect more data about your customers so you
> can get more informations (and sell more stuff). And map reduce is a
> wonderful tool for making real fancy "statistics", which is very
> interesting for an insurance company. Your mathematical economist will
> REALLY love it ;).
>
> Hope this helped.
>
> best wishes
>
> Wilm
>
>
>

--001a113335266114cb050460571e
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Adding hbase user.</div><div class=3D"gmail_extra"><br><di=
v class=3D"gmail_quote">On Wed, Oct 1, 2014 at 11:02 AM, Wilm Schumacher <s=
pan dir=3D"ltr">&lt;<a href=3D"mailto:wilm.schumacher@cawoom.com" target=3D=
"_blank">wilm.schumacher@cawoom.com</a>&gt;</span> wrote:<br><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;p=
adding-left:1ex">Hi,<br>
<br>
first: I think hbase is what you are looking for. If I understand<br>
correctly you want to show the customer his or her data very fast and<br>
let them manipulate their data. So you need something like a data<br>
warehouse system. Thus, hbase is the method of choice for you (and I<br>
think for your kind of data, hbase is a better choice than cassandra or<br>
mongoDB). But of course you need a running hadoop system to run a hbase.<br=
>
So it&#39;s not an either/or ;)<br>
<br>
(my answers are for hbase, as I think it&#39;s what you are looking for. If=
<br>
you are not interested, just ignore the following text. Sry @all by<br>
writing about hbase on this list ;).)<br>
<br>
Am 01.10.2014 um 17:24 schrieb mani kandan:<br>
&gt; 1) How much web usage data will a typical website like ours collect on=
 a<br>
&gt; daily basis? (I know I can ask our IT department, but I would like to<=
br>
&gt; gather some background idea before talking to them.)<br>
well, if you have the option to ask your IT department you should do<br>
that, because everyone here would have to guess. You would have to<br>
explain very detailed what you have to do to let us guess. If you e.g.<br>
want to track the user on what he or she has clicked, perhaps to make<br>
personalized ads, than you have to save more data. So, you should ask<br>
the persons who have the data right away without guessing.<br>
<br>
&gt; 3) How many clusters/nodes would I need to =E2=80=8Brun a web usage an=
alytics<br>
&gt; system?<br>
in the book &quot;hbase in action&quot; there are some recommendations for =
some<br>
&quot;case studies&quot; (part IV &quot;deploying hbase&quot;). There are s=
ome thoughts on<br>
the number of nodes, and how to use them, depending on the size of your dat=
a<br>
<br>
&gt; 4) What are the ways for me to use our data? (One use case I&#39;m thi=
nking<br>
&gt; of is to analyze the error messages log for each page on quote process=
<br>
&gt; to redesign the UI. Is this possible?)<br>
sure. And this should be very easy. I would pump the error log into a<br>
hbase table. By this method you could read the messages directly from<br>
the hbase shell (if they are few enough). Or you could use hive to query<br=
>
your log a little more &quot;sql like&quot; and make statistics very easy.<=
br>
<br>
&gt; 5) How long would it take for me to set up and start such a system?<br=
>
for a novice who have to do it for the first time: for the stand alone<br>
hbase system perhaps 2 hours. For a complete distributed test cluster<br>
... perhaps a day. For the real producing system, with all security<br>
features ... a little longer ;).<br>
<br>
&gt; I&#39;m sorry if some/all of these questions are unanswerable. I just =
want<br>
&gt; to discuss my thoughts, and get an idea of what things can I achieve b=
y<br>
&gt; going the way of Hadoop.<br>
well, I think, but I could err, that you think of hadoop (or hbase) in a<br=
>
way that you just can change the &quot;database backend&quot; from &quot;SQ=
L&quot; to<br>
&quot;hbase/hadoop&quot; and everything would run right away. This will not=
 be<br>
that easy. You would have to change the code of your web application in<br>
a very fundamental way. You have to rethink all the table designs etc.,<br>
so this could be more complicate than you think right know.<br>
<br>
However, hbase/hadoop hase some advantages which are very interesing for<br=
>
you. Well first, it is distributed, which enables your company to grow<br>
almost limitless, or to collect more data about your customers so you<br>
can get more informations (and sell more stuff). And map reduce is a<br>
wonderful tool for making real fancy &quot;statistics&quot;, which is very<=
br>
interesting for an insurance company. Your mathematical economist will<br>
REALLY love it ;).<br>
<br>
Hope this helped.<br>
<br>
best wishes<br>
<span class=3D"HOEnZb"><font color=3D"#888888"><br>
Wilm<br>
<br>
<br>
</font></span></blockquote></div><br></div>

--001a113335266114cb050460571e--