hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: AW: Hadoop Case Studies with interactive applications...an antagonism?
Date Fri, 19 Aug 2011 19:28:31 GMT

I am really not an expert on Hive, but from what I know Hive translates the SQL into 1 or
more Map/Reduce jobs to execute the query.  It does optimizations to try and reduce the number
of jobs that it launches and to try and speed things up.  I also know that Pig has looked
into supporting some alternative paradigms similar to spark to speed up processing of small
jobs, but that is still in the discussion stages and I have no idea where that will end up.
I can only assume that Hive is also looking into similar things.


On 8/19/11 1:44 PM, "Christian Schäfer" <syrious3000@yahoo.de> wrote:

Hi Bobby,

thanks for the information provided :)

I'm glad there are some possibilities to use hadoop+hbase....was a bit afraid a
had to discard that mighty tool (in my project)

As I'm still at the beginning of learning hadoop I just got one basic question:
Is every query i send via hive to hbase in the background realized as a
map/reduce-job or does it work in another (more efficient) kind? (I know RTFM
would be an appropriate answer...but it still searched...and did not find the
"answer" yet.

the mesos and storm stuff looks interesting..will take it into account for my
evaluation if possible.

somehow I think pig + hive + cloudera tools will be implemented later because of
proven tech, high level, tooling and possibility of getting support.

But I will check out the spark and storm as they seem to have some interesting
concepts :)


Von: Robert Evans <evans@yahoo-inc.com>
An: "general@hadoop.apache.org" <general@hadoop.apache.org>
Gesendet: Freitag, den 19. August 2011, 17:35:08 Uhr
Betreff: Re: Hadoop Case Studies with interactive applications...an antagonism?


Hadoop is best for batch processing because it is optimized for that use case.
It is not that it cannot handle small jobs.  Those jobs tend to be some what
slower then other systems and also not as consistent in their processing time as
some use cases really need.  You can get around this some what by over
provisioning your grid.

If you want to do monitoring of sensor data Hadoop should be able to handle it,
so long as your SLAs are not extremely tight.  This is especially true as the
size of your data grows.  You might want to look at HBase.  It can be very fast
and interactive, and because it stores the data in HDFS you can process it with
Map/Reduce if you need to.  There are a number of interactive/fast processing
solutions on top of HDFS too that are either available now or should be soon
once MRV2 stabilizes some more.  Look at Spark which is part of the mesos
project at Berkley (www.mesosproject.org).  Another thing to look at is Hive or
Pig if you want to be able to query the data with a higher level language.

Another solution that looks very interesting once it is released as open source
is storm
 It looks like it could be modified a bit to run under YARN (MRV2) and then you
can store your modules state in HBase.  That would compliment Hadoop's MapReduce
processing very nicely and do a lot of what you are looking at doing in real


On 8/19/11 8:06 AM, "Christian Schäfer" <syrious3000@yahoo.de> wrote:

Hi Hadoopians,

I'm a noob in hadoop (what a rhyme) ....and got some questions relating to the
white papers posted on cloudera.com as follows:

  in IQT  QUARTERLY: HADOOP: Scalable, Flexible Data Storage  and Analysis - By
Mike Olson

    I got an antagonism when comparing case studies and the following pros&cons
of hadoop.

    pros: hadoop(M/R) mostly used in batch operation (running mins or hours to
    cons: hadoop(M/R) not usable for interactive applications

    and the case study: OpenPDC where it is used for monitoring and to be able
to react quickly:
        "Close monitoring and rapid response to changes in the state of the grid
allow utilities to minimize or prevent blackouts,"

    another case study from "Ten Common Hadoopable Problems - Real-World Hadoop
Use Cases":
        "Fast detection allows the bank to protect itself from considerable

If there is a better non-commercial place to ask this questions please let me

Background: I'm intending to set up a system for another domain where lots of
sensordata need to be stored
and queried to implement monitoring an detect problem situations

kind regards

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message