hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: Hadoop Case Studies with interactive applications...an antagonism?
Date Fri, 19 Aug 2011 15:35:08 GMT

Hadoop is best for batch processing because it is optimized for that use case.  It is not
that it cannot handle small jobs.  Those jobs tend to be some what slower then other systems
and also not as consistent in their processing time as some use cases really need.  You can
get around this some what by over provisioning your grid.

If you want to do monitoring of sensor data Hadoop should be able to handle it, so long as
your SLAs are not extremely tight.  This is especially true as the size of your data grows.
 You might want to look at HBase.  It can be very fast and interactive, and because it stores
the data in HDFS you can process it with Map/Reduce if you need to.  There are a number of
interactive/fast processing solutions on top of HDFS too that are either available now or
should be soon once MRV2 stabilizes some more.  Look at Spark which is part of the mesos project
at Berkley (www.mesosproject.org).  Another thing to look at is Hive or Pig if you want to
be able to query the data with a higher level language.

Another solution that looks very interesting once it is released as open source is storm http://engineering.twitter.com/2011/08/storm-is-coming-more-details-and-plans.html
It looks like it could be modified a bit to run under YARN (MRV2) and then you can store your
modules state in HBase.  That would compliment Hadoop's MapReduce processing very nicely and
do a lot of what you are looking at doing in real time.


On 8/19/11 8:06 AM, "Christian Schäfer" <syrious3000@yahoo.de> wrote:

Hi Hadoopians,

I'm a noob in hadoop (what a rhyme) ....and got some questions relating to the
white papers posted on cloudera.com as follows:

  in IQT  QUARTERLY: HADOOP: Scalable, Flexible Data Storage  and Analysis - By
Mike Olson

    I got an antagonism when comparing case studies and the following pros&cons
of hadoop.

    pros: hadoop(M/R) mostly used in batch operation (running mins or hours to
    cons: hadoop(M/R) not usable for interactive applications

    and the case study: OpenPDC where it is used for monitoring and to be able
to react quickly:
        "Close monitoring and rapid response to changes in the state of the grid
allow utilities to minimize or prevent blackouts,"

    another case study from "Ten Common Hadoopable Problems - Real-World Hadoop
Use Cases":
        "Fast detection allows the bank to protect itself from considerable

If there is a better non-commercial place to ask this questions please let me

Background: I'm intending to set up a system for another domain where lots of
sensordata need to be stored
and queried to implement monitoring an detect problem situations

kind regards

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message