hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Planning to propose Hadoop initiative to company. Need some inputs please.
Date Wed, 01 Oct 2014 18:05:12 GMT
Adding hbase user.

On Wed, Oct 1, 2014 at 11:02 AM, Wilm Schumacher <wilm.schumacher@cawoom.com
> wrote:

> Hi,
> first: I think hbase is what you are looking for. If I understand
> correctly you want to show the customer his or her data very fast and
> let them manipulate their data. So you need something like a data
> warehouse system. Thus, hbase is the method of choice for you (and I
> think for your kind of data, hbase is a better choice than cassandra or
> mongoDB). But of course you need a running hadoop system to run a hbase.
> So it's not an either/or ;)
> (my answers are for hbase, as I think it's what you are looking for. If
> you are not interested, just ignore the following text. Sry @all by
> writing about hbase on this list ;).)
> Am 01.10.2014 um 17:24 schrieb mani kandan:
> > 1) How much web usage data will a typical website like ours collect on a
> > daily basis? (I know I can ask our IT department, but I would like to
> > gather some background idea before talking to them.)
> well, if you have the option to ask your IT department you should do
> that, because everyone here would have to guess. You would have to
> explain very detailed what you have to do to let us guess. If you e.g.
> want to track the user on what he or she has clicked, perhaps to make
> personalized ads, than you have to save more data. So, you should ask
> the persons who have the data right away without guessing.
> > 3) How many clusters/nodes would I need to ​run a web usage analytics
> > system?
> in the book "hbase in action" there are some recommendations for some
> "case studies" (part IV "deploying hbase"). There are some thoughts on
> the number of nodes, and how to use them, depending on the size of your
> data
> > 4) What are the ways for me to use our data? (One use case I'm thinking
> > of is to analyze the error messages log for each page on quote process
> > to redesign the UI. Is this possible?)
> sure. And this should be very easy. I would pump the error log into a
> hbase table. By this method you could read the messages directly from
> the hbase shell (if they are few enough). Or you could use hive to query
> your log a little more "sql like" and make statistics very easy.
> > 5) How long would it take for me to set up and start such a system?
> for a novice who have to do it for the first time: for the stand alone
> hbase system perhaps 2 hours. For a complete distributed test cluster
> ... perhaps a day. For the real producing system, with all security
> features ... a little longer ;).
> > I'm sorry if some/all of these questions are unanswerable. I just want
> > to discuss my thoughts, and get an idea of what things can I achieve by
> > going the way of Hadoop.
> well, I think, but I could err, that you think of hadoop (or hbase) in a
> way that you just can change the "database backend" from "SQL" to
> "hbase/hadoop" and everything would run right away. This will not be
> that easy. You would have to change the code of your web application in
> a very fundamental way. You have to rethink all the table designs etc.,
> so this could be more complicate than you think right know.
> However, hbase/hadoop hase some advantages which are very interesing for
> you. Well first, it is distributed, which enables your company to grow
> almost limitless, or to collect more data about your customers so you
> can get more informations (and sell more stuff). And map reduce is a
> wonderful tool for making real fancy "statistics", which is very
> interesting for an insurance company. Your mathematical economist will
> REALLY love it ;).
> Hope this helped.
> best wishes
> Wilm

View raw message