Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7CD4E17A9A for ; Wed, 1 Oct 2014 18:47:13 +0000 (UTC) Received: (qmail 41648 invoked by uid 500); 1 Oct 2014 18:47:08 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 41524 invoked by uid 500); 1 Oct 2014 18:47:08 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 41514 invoked by uid 99); 1 Oct 2014 18:47:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Oct 2014 18:47:07 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of nidmgg@gmail.com designates 209.85.220.170 as permitted sender) Received: from [209.85.220.170] (HELO mail-vc0-f170.google.com) (209.85.220.170) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Oct 2014 18:47:03 +0000 Received: by mail-vc0-f170.google.com with SMTP id hy10so707832vcb.29 for ; Wed, 01 Oct 2014 11:46:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=2eKmu7ZDEvNl7X6YZTy7SLjvmb/h01VfLhM+o0BJGiw=; b=xs/a/+6PTZyOYy0ez43PcSy8NtESB9l3cC9+ObWo+HwfsZcODfNnVJU2Gti9dmGsDd 0PsjpnlFAcSvy7GXwofxyDbBtFyn+E0WmqfF40n9CyDpIB35GMspFhtQVeuF5b2RNRAb bdglyuofcahewqXIuhe3YGS0N3Tx/NxN2yPh5bpTHH6tXO8JVnkSrOaXBk4hoF9sRMjv VJ/NqTvD5Dn5dXeq6/7ilEif8m2xxDT451KgizR3e1Sd29vZpo+6c02KhTXsbPiJpe/E hcB1119wKtdgxS8TFUCdZkkIb1olzhiz4ajgZv261zHUWWnPTgdrXdE4zmTgHxhS4f+I AnKA== MIME-Version: 1.0 X-Received: by 10.220.186.196 with SMTP id ct4mr3519816vcb.51.1412189202620; Wed, 01 Oct 2014 11:46:42 -0700 (PDT) Received: by 10.220.192.72 with HTTP; Wed, 1 Oct 2014 11:46:42 -0700 (PDT) In-Reply-To: <542C41CE.7000709@cawoom.com> References: <542C41CE.7000709@cawoom.com> Date: Wed, 1 Oct 2014 11:46:42 -0700 Message-ID: Subject: Re: Planning to propose Hadoop initiative to company. Need some inputs please. From: Demai Ni To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=047d7b676312c7277a050460eb19 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b676312c7277a050460eb19 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable hi, glad to see another person moving from mainframe world to the 'big' data one. I was in the same boat a few years back after working on mainframe for 10+ years. Wilm got to the pointers already. I'd like to just chime in a bit from mainframe side. The example of website usage is a very good one for bigdata comparing to mainframe, as mainframe is very expensive to provide reliability for mission-critical workload. One approach is to look at what the current application running on mainframe or your guys are considering to implement on mainframe. For a website usage case, the cost to implement and running would be only 1/10 if on hadoop/hbase, comparing to mainframe. And mainframe probably not able to scale up if the data goes to TB. 2nd, be careful that Hadoop is not for all your cases. I am pretty such that your IT department is handling some mission-critical workloads, like payroll, employee info, customer-payment, etc. Leaving those workloads on mainframe. for 1) hbase/hadoop are not design for such RDMS workload; for 2) moving from one database to another is way too much risk unless the top boss force you do so... :-) Demai On Wed, Oct 1, 2014 at 11:02 AM, Wilm Schumacher wrote: > Hi, > > first: I think hbase is what you are looking for. If I understand > correctly you want to show the customer his or her data very fast and > let them manipulate their data. So you need something like a data > warehouse system. Thus, hbase is the method of choice for you (and I > think for your kind of data, hbase is a better choice than cassandra or > mongoDB). But of course you need a running hadoop system to run a hbase. > So it's not an either/or ;) > > (my answers are for hbase, as I think it's what you are looking for. If > you are not interested, just ignore the following text. Sry @all by > writing about hbase on this list ;).) > > Am 01.10.2014 um 17:24 schrieb mani kandan: > > 1) How much web usage data will a typical website like ours collect on = a > > daily basis? (I know I can ask our IT department, but I would like to > > gather some background idea before talking to them.) > well, if you have the option to ask your IT department you should do > that, because everyone here would have to guess. You would have to > explain very detailed what you have to do to let us guess. If you e.g. > want to track the user on what he or she has clicked, perhaps to make > personalized ads, than you have to save more data. So, you should ask > the persons who have the data right away without guessing. > > > 3) How many clusters/nodes would I need to =E2=80=8Brun a web usage ana= lytics > > system? > in the book "hbase in action" there are some recommendations for some > "case studies" (part IV "deploying hbase"). There are some thoughts on > the number of nodes, and how to use them, depending on the size of your > data > > > 4) What are the ways for me to use our data? (One use case I'm thinking > > of is to analyze the error messages log for each page on quote process > > to redesign the UI. Is this possible?) > sure. And this should be very easy. I would pump the error log into a > hbase table. By this method you could read the messages directly from > the hbase shell (if they are few enough). Or you could use hive to query > your log a little more "sql like" and make statistics very easy. > > > 5) How long would it take for me to set up and start such a system? > for a novice who have to do it for the first time: for the stand alone > hbase system perhaps 2 hours. For a complete distributed test cluster > ... perhaps a day. For the real producing system, with all security > features ... a little longer ;). > > > I'm sorry if some/all of these questions are unanswerable. I just want > > to discuss my thoughts, and get an idea of what things can I achieve by > > going the way of Hadoop. > well, I think, but I could err, that you think of hadoop (or hbase) in a > way that you just can change the "database backend" from "SQL" to > "hbase/hadoop" and everything would run right away. This will not be > that easy. You would have to change the code of your web application in > a very fundamental way. You have to rethink all the table designs etc., > so this could be more complicate than you think right know. > > However, hbase/hadoop hase some advantages which are very interesing for > you. Well first, it is distributed, which enables your company to grow > almost limitless, or to collect more data about your customers so you > can get more informations (and sell more stuff). And map reduce is a > wonderful tool for making real fancy "statistics", which is very > interesting for an insurance company. Your mathematical economist will > REALLY love it ;). > > Hope this helped. > > best wishes > > Wilm > > > --047d7b676312c7277a050460eb19 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
hi,

glad to see another person mov= ing from mainframe world to the 'big' data one. I was in the same b= oat a few years back after working on mainframe for 10+ years.

Wilm= got to the pointers already. I'd like to just chime in a bit from main= frame side.

The example of website usage is a very good one for big= data comparing to mainframe, as mainframe is very expensive to provide reli= ability for mission-critical workload. One approach is to look at what the = current application running on mainframe or your guys are considering to im= plement on mainframe. For a website usage case, the cost to implement and r= unning would be only 1/10 if on hadoop/hbase, comparing to mainframe. And m= ainframe probably not able to scale up if the data goes to TB.

2nd,= be careful that Hadoop is not for all your cases. I am pretty such that yo= ur IT department is handling some mission-critical workloads, like payroll,= employee info, customer-payment, etc. Leaving those workloads on mainframe= . for 1) hbase/hadoop are not design for such RDMS workload; for 2) moving = from one database to another is way too much risk unless the top boss force= you do so... :-)

Demai


On Wed, Oct 1, 2014 at 11:02 AM= , Wilm Schumacher <wilm.schumacher@cawoom.com> wrot= e:
Hi,

first: I think hbase is what you are looking for. If I understand
correctly you want to show the customer his or her data very fast and
let them manipulate their data. So you need something like a data
warehouse system. Thus, hbase is the method of choice for you (and I
think for your kind of data, hbase is a better choice than cassandra or
mongoDB). But of course you need a running hadoop system to run a hbase. So it's not an either/or ;)

(my answers are for hbase, as I think it's what you are looking for. If=
you are not interested, just ignore the following text. Sry @all by
writing about hbase on this list ;).)

Am 01.10.2014 um 17:24 schrieb mani kandan:
> 1) How much web usage data will a typical website lik= e ours collect on a
> daily basis? (I know I can ask our IT department, but I would like to<= br> > gather some background idea before talking to them.)
well, if you have the option to ask your IT department you should do=
that, because everyone here would have to guess. You would have to
explain very detailed what you have to do to let us guess. If you e.g.
want to track the user on what he or she has clicked, perhaps to make
personalized ads, than you have to save more data. So, you should ask
the persons who have the data right away without guessing.

> 3) How many clusters/nodes would I need to =E2=80=8Brun a web usage an= alytics
> system?
in the book "hbase in action" there are some recommendatio= ns for some
"case studies" (part IV "deploying hbase"). There are s= ome thoughts on
the number of nodes, and how to use them, depending on the size of your dat= a

> 4) What are the ways for me to use our data? (One use case I'm thi= nking
> of is to analyze the error messages log for each page on quote process=
> to redesign the UI. Is this possible?)
sure. And this should be very easy. I would pump the error log into = a
hbase table. By this method you could read the messages directly from
the hbase shell (if they are few enough). Or you could use hive to query your log a little more "sql like" and make statistics very easy.<= br>
> 5) How long would it take for me to set up and start such a system?
for a novice who have to do it for the first time: for the stand alo= ne
hbase system perhaps 2 hours. For a complete distributed test cluster
... perhaps a day. For the real producing system, with all security
features ... a little longer ;).

> I'm sorry if some/all of these questions are unanswerable. I just = want
> to discuss my thoughts, and get an idea of what things can I achieve b= y
> going the way of Hadoop.
well, I think, but I could err, that you think of hadoop (or hbase) = in a
way that you just can change the "database backend" from "SQ= L" to
"hbase/hadoop" and everything would run right away. This will not= be
that easy. You would have to change the code of your web application in
a very fundamental way. You have to rethink all the table designs etc.,
so this could be more complicate than you think right know.

However, hbase/hadoop hase some advantages which are very interesing for you. Well first, it is distributed, which enables your company to grow
almost limitless, or to collect more data about your customers so you
can get more informations (and sell more stuff). And map reduce is a
wonderful tool for making real fancy "statistics", which is very<= br> interesting for an insurance company. Your mathematical economist will
REALLY love it ;).

Hope this helped.

best wishes

Wilm



--047d7b676312c7277a050460eb19--