Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of nidmgg@gmail.com designates
 209.85.220.170 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <542C41CE.7000709@cawoom.com>
References: 
 <CALjx1-8U+tt4ZHKG3Tyh=oDzs6ZMGgrf5YpS9g48-aXoOToqcA@mail.gmail.com>
	<542C41CE.7000709@cawoom.com>
Date: Wed, 1 Oct 2014 11:46:42 -0700
Message-ID: 
 <CAOEq2C6LxS0ECM-KTMWi=HgXOf1d1Wj0Cy-agxXw-XaVdFh5cw@mail.gmail.com>
Subject: Re: Planning to propose Hadoop initiative to company. Need some
 inputs please.
From: Demai Ni <nidmgg@gmail.com>
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=047d7b676312c7277a050460eb19

--047d7b676312c7277a050460eb19
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

hi,

glad to see another person moving from mainframe world to the 'big' data
one. I was in the same boat a few years back after working on mainframe for
10+ years.

Wilm got to the pointers already. I'd like to just chime in a bit from
mainframe side.

The example of website usage is a very good one for bigdata comparing to
mainframe, as mainframe is very expensive to provide reliability for
mission-critical workload. One approach is to look at what the current
application running on mainframe or your guys are considering to implement
on mainframe. For a website usage case, the cost to implement and running
would be only 1/10 if on hadoop/hbase, comparing to mainframe. And
mainframe probably not able to scale up if the data goes to TB.

2nd, be careful that Hadoop is not for all your cases. I am pretty such
that your IT department is handling some mission-critical workloads, like
payroll, employee info, customer-payment, etc. Leaving those workloads on
mainframe. for 1) hbase/hadoop are not design for such RDMS workload; for
2) moving from one database to another is way too much risk unless the top
boss force you do so... :-)

Demai


On Wed, Oct 1, 2014 at 11:02 AM, Wilm Schumacher <wilm.schumacher@cawoom.co=
m
> wrote:

> Hi,
>
> first: I think hbase is what you are looking for. If I understand
> correctly you want to show the customer his or her data very fast and
> let them manipulate their data. So you need something like a data
> warehouse system. Thus, hbase is the method of choice for you (and I
> think for your kind of data, hbase is a better choice than cassandra or
> mongoDB). But of course you need a running hadoop system to run a hbase.
> So it's not an either/or ;)
>
> (my answers are for hbase, as I think it's what you are looking for. If
> you are not interested, just ignore the following text. Sry @all by
> writing about hbase on this list ;).)
>
> Am 01.10.2014 um 17:24 schrieb mani kandan:
> > 1) How much web usage data will a typical website like ours collect on =
a
> > daily basis? (I know I can ask our IT department, but I would like to
> > gather some background idea before talking to them.)
> well, if you have the option to ask your IT department you should do
> that, because everyone here would have to guess. You would have to
> explain very detailed what you have to do to let us guess. If you e.g.
> want to track the user on what he or she has clicked, perhaps to make
> personalized ads, than you have to save more data. So, you should ask
> the persons who have the data right away without guessing.
>
> > 3) How many clusters/nodes would I need to =E2=80=8Brun a web usage ana=
lytics
> > system?
> in the book "hbase in action" there are some recommendations for some
> "case studies" (part IV "deploying hbase"). There are some thoughts on
> the number of nodes, and how to use them, depending on the size of your
> data
>
> > 4) What are the ways for me to use our data? (One use case I'm thinking
> > of is to analyze the error messages log for each page on quote process
> > to redesign the UI. Is this possible?)
> sure. And this should be very easy. I would pump the error log into a
> hbase table. By this method you could read the messages directly from
> the hbase shell (if they are few enough). Or you could use hive to query
> your log a little more "sql like" and make statistics very easy.
>
> > 5) How long would it take for me to set up and start such a system?
> for a novice who have to do it for the first time: for the stand alone
> hbase system perhaps 2 hours. For a complete distributed test cluster
> ... perhaps a day. For the real producing system, with all security
> features ... a little longer ;).
>
> > I'm sorry if some/all of these questions are unanswerable. I just want
> > to discuss my thoughts, and get an idea of what things can I achieve by
> > going the way of Hadoop.
> well, I think, but I could err, that you think of hadoop (or hbase) in a
> way that you just can change the "database backend" from "SQL" to
> "hbase/hadoop" and everything would run right away. This will not be
> that easy. You would have to change the code of your web application in
> a very fundamental way. You have to rethink all the table designs etc.,
> so this could be more complicate than you think right know.
>
> However, hbase/hadoop hase some advantages which are very interesing for
> you. Well first, it is distributed, which enables your company to grow
> almost limitless, or to collect more data about your customers so you
> can get more informations (and sell more stuff). And map reduce is a
> wonderful tool for making real fancy "statistics", which is very
> interesting for an insurance company. Your mathematical economist will
> REALLY love it ;).
>
> Hope this helped.
>
> best wishes
>
> Wilm
>
>
>

--047d7b676312c7277a050460eb19
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div>hi, <br><br></div>glad to see another person mov=
ing from mainframe world to the &#39;big&#39; data one. I was in the same b=
oat a few years back after working on mainframe for 10+ years. <br><br>Wilm=
 got to the pointers already. I&#39;d like to just chime in a bit from main=
frame side. <br><br>The example of website usage is a very good one for big=
data comparing to mainframe, as mainframe is very expensive to provide reli=
ability for mission-critical workload. One approach is to look at what the =
current application running on mainframe or your guys are considering to im=
plement on mainframe. For a website usage case, the cost to implement and r=
unning would be only 1/10 if on hadoop/hbase, comparing to mainframe. And m=
ainframe probably not able to scale up if the data goes to TB. <br><br>2nd,=
 be careful that Hadoop is not for all your cases. I am pretty such that yo=
ur IT department is handling some mission-critical workloads, like payroll,=
 employee info, customer-payment, etc. Leaving those workloads on mainframe=
. for 1) hbase/hadoop are not design for such RDMS workload; for 2) moving =
from one database to another is way too much risk unless the top boss force=
 you do so... :-)<br><br></div><div>Demai<br></div><br></div><div class=3D"=
gmail_extra"><br><div class=3D"gmail_quote">On Wed, Oct 1, 2014 at 11:02 AM=
, Wilm Schumacher <span dir=3D"ltr">&lt;<a href=3D"mailto:wilm.schumacher@c=
awoom.com" target=3D"_blank">wilm.schumacher@cawoom.com</a>&gt;</span> wrot=
e:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l=
eft:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
first: I think hbase is what you are looking for. If I understand<br>
correctly you want to show the customer his or her data very fast and<br>
let them manipulate their data. So you need something like a data<br>
warehouse system. Thus, hbase is the method of choice for you (and I<br>
think for your kind of data, hbase is a better choice than cassandra or<br>
mongoDB). But of course you need a running hadoop system to run a hbase.<br=
>
So it&#39;s not an either/or ;)<br>
<br>
(my answers are for hbase, as I think it&#39;s what you are looking for. If=
<br>
you are not interested, just ignore the following text. Sry @all by<br>
writing about hbase on this list ;).)<br>
<br>
Am 01.10.2014 um 17:24 schrieb mani kandan:<br>
<span class=3D"">&gt; 1) How much web usage data will a typical website lik=
e ours collect on a<br>
&gt; daily basis? (I know I can ask our IT department, but I would like to<=
br>
&gt; gather some background idea before talking to them.)<br>
</span>well, if you have the option to ask your IT department you should do=
<br>
that, because everyone here would have to guess. You would have to<br>
explain very detailed what you have to do to let us guess. If you e.g.<br>
want to track the user on what he or she has clicked, perhaps to make<br>
personalized ads, than you have to save more data. So, you should ask<br>
the persons who have the data right away without guessing.<br>
<span class=3D""><br>
&gt; 3) How many clusters/nodes would I need to =E2=80=8Brun a web usage an=
alytics<br>
&gt; system?<br>
</span>in the book &quot;hbase in action&quot; there are some recommendatio=
ns for some<br>
&quot;case studies&quot; (part IV &quot;deploying hbase&quot;). There are s=
ome thoughts on<br>
the number of nodes, and how to use them, depending on the size of your dat=
a<br>
<span class=3D""><br>
&gt; 4) What are the ways for me to use our data? (One use case I&#39;m thi=
nking<br>
&gt; of is to analyze the error messages log for each page on quote process=
<br>
&gt; to redesign the UI. Is this possible?)<br>
</span>sure. And this should be very easy. I would pump the error log into =
a<br>
hbase table. By this method you could read the messages directly from<br>
the hbase shell (if they are few enough). Or you could use hive to query<br=
>
your log a little more &quot;sql like&quot; and make statistics very easy.<=
br>
<span class=3D""><br>
&gt; 5) How long would it take for me to set up and start such a system?<br=
>
</span>for a novice who have to do it for the first time: for the stand alo=
ne<br>
hbase system perhaps 2 hours. For a complete distributed test cluster<br>
... perhaps a day. For the real producing system, with all security<br>
features ... a little longer ;).<br>
<span class=3D""><br>
&gt; I&#39;m sorry if some/all of these questions are unanswerable. I just =
want<br>
&gt; to discuss my thoughts, and get an idea of what things can I achieve b=
y<br>
&gt; going the way of Hadoop.<br>
</span>well, I think, but I could err, that you think of hadoop (or hbase) =
in a<br>
way that you just can change the &quot;database backend&quot; from &quot;SQ=
L&quot; to<br>
&quot;hbase/hadoop&quot; and everything would run right away. This will not=
 be<br>
that easy. You would have to change the code of your web application in<br>
a very fundamental way. You have to rethink all the table designs etc.,<br>
so this could be more complicate than you think right know.<br>
<br>
However, hbase/hadoop hase some advantages which are very interesing for<br=
>
you. Well first, it is distributed, which enables your company to grow<br>
almost limitless, or to collect more data about your customers so you<br>
can get more informations (and sell more stuff). And map reduce is a<br>
wonderful tool for making real fancy &quot;statistics&quot;, which is very<=
br>
interesting for an insurance company. Your mathematical economist will<br>
REALLY love it ;).<br>
<br>
Hope this helped.<br>
<br>
best wishes<br>
<span class=3D"HOEnZb"><font color=3D"#888888"><br>
Wilm<br>
<br>
<br>
</font></span></blockquote></div><br></div>

--047d7b676312c7277a050460eb19--