hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Smith <csmi...@gmail.com>
Subject RE: Hadoop Case Studies with interactive applications...an antagonism?
Date Mon, 22 Aug 2011 16:08:49 GMT

It might be worth looking at OpenTSDB (http://opentsdb.net/)  as you
get HBase and a collection agents for metrics.

To quote from their home page:

"OpenTSDB is a distributed, scalable Time Series Database (TSDB)
written on top of HBase. OpenTSDB was written to address a common
need: store, index and serve metrics collected from computer systems
(network gear, operating systems, applications) at a large scale, and
make this data easily accessible and graphable."

It might save some effort if it fits your use case.

Regards,  Chris

> -----Original Message-----
> From: Christian Schäfer [mailto:syrious3000@yahoo.de]
> Sent: 19 August 2011 19:44
> To: general@hadoop.apache.org
> Subject: AW: Hadoop Case Studies with interactive
> applications...an antagonism?
> Hi Bobby,
> thanks for the information provided :)
> I'm glad there are some possibilities to use
> hadoop+hbase....was a bit afraid a
> had to discard that mighty tool (in my project)
> As I'm still at the beginning of learning hadoop I just got
> one basic question:
> Is every query i send via hive to hbase in the background
> realized as a
> map/reduce-job or does it work in another (more efficient)
> kind? (I know RTFM
> would be an appropriate answer...but it still searched...and
> did not find the
> "answer" yet.
> the mesos and storm stuff looks interesting..will take it
> into account for my
> evaluation if possible.
> somehow I think pig + hive + cloudera tools will be
> implemented later because of
> proven tech, high level, tooling and possibility of getting support.
> But I will check out the spark and storm as they seem to have
> some interesting
> concepts :)
> regards
> Christian
> ________________________________
> Von: Robert Evans <evans@yahoo-inc.com>
> An: "general@hadoop.apache.org" <general@hadoop.apache.org>
> Gesendet: Freitag, den 19. August 2011, 17:35:08 Uhr
> Betreff: Re: Hadoop Case Studies with interactive
> applications...an antagonism?
> Christian,
> Hadoop is best for batch processing because it is optimized
> for that use case.
> It is not that it cannot handle small jobs.  Those jobs tend
> to be some what
> slower then other systems and also not as consistent in their
> processing time as
> some use cases really need.  You can get around this some
> what by over
> provisioning your grid.
> If you want to do monitoring of sensor data Hadoop should be
> able to handle it,
> so long as your SLAs are not extremely tight.  This is
> especially true as the
> size of your data grows.  You might want to look at HBase.
> It can be very fast
> and interactive, and because it stores the data in HDFS you
> can process it with
> Map/Reduce if you need to.  There are a number of
> interactive/fast processing
> solutions on top of HDFS too that are either available now or
> should be soon
> once MRV2 stabilizes some more.  Look at Spark which is part
> of the mesos
> project at Berkley (www.mesosproject.org).  Another thing to
> look at is Hive or
> Pig if you want to be able to query the data with a higher
> level language.
> Another solution that looks very interesting once it is
> released as open source
> is storm
> http://engineering.twitter.com/2011/08/storm-is-coming-more-de
> tails-and-plans.html
>  It looks like it could be modified a bit to run under YARN
> (MRV2) and then you
> can store your modules state in HBase.  That would compliment
> Hadoop's MapReduce
> processing very nicely and do a lot of what you are looking
> at doing in real
> time.
> --Bobby
> On 8/19/11 8:06 AM, "Christian Schäfer" <syrious3000@yahoo.de> wrote:
> Hi Hadoopians,
> I'm a noob in hadoop (what a rhyme) ....and got some
> questions relating to the
> white papers posted on cloudera.com as follows:
>   in IQT  QUARTERLY: HADOOP: Scalable, Flexible Data Storage
> and Analysis - By
> Mike Olson
>     I got an antagonism when comparing case studies and the
> following pros&cons
> of hadoop.
>     pros: hadoop(M/R) mostly used in batch operation (running
> mins or hours to
> complete)
>     cons: hadoop(M/R) not usable for interactive applications
>     and the case study: OpenPDC where it is used for
> monitoring and to be able
> to react quickly:
>         "Close monitoring and rapid response to changes in
> the state of the grid
> allow utilities to minimize or prevent blackouts,"
>     another case study from "Ten Common Hadoopable Problems -
> Real-World Hadoop
> Use Cases":
>         "Fast detection allows the bank to protect itself
> from considerable
> losses."
> If there is a better non-commercial place to ask this
> questions please let me
> know.
> Background: I'm intending to set up a system for another
> domain where lots of
> sensordata need to be stored
> and queried to implement monitoring an detect problem situations
> kind regards
> Christian

View raw message