oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: Hadoop Similarities
Date Sun, 03 Nov 2013 14:27:48 GMT
Hi Tom,

On Fri, Nov 1, 2013 at 8:09 AM, Tom Barber <tom.barber@meteorite.bi> wrote:

>  Morning,
> Chris will remember a couple of years ago me asking on IRC about how OODT
> differs from Hadoop in terms of features and functionality, which he then
> gave a great page long explanation as to what the differences were. I vowed
> to copy that information off and save it somewhere useful, and of course
> never did, then I asked Sean who also couldn't dig it up.

What a shame. Would have been great to at least see this if not get it
documented as you mention. Oh well. Community lists are as good as it's get
IMHO so here we go.

> So, fine folks of the OODT community, for a novice like me who would be
> interested in "selling" OODT to users if the correct usecase came along,
> when someone says "Isn't OODT just a different type of Hadoop?" what do I
> answer?

I am relatively new to OODT. My opinion here is pretty abstract however I
have been using Hadoop much longer and therefore hope that some of what I'm
saying contributes to our shared understanding.

I was attracted to OODT due to the modular, component-oriented design of
the project as a whole. It is down to the system designer (the initial
person/team who pick up OODT) to review and select which aspects of the
overall project they need to select to satisfy and accommodate their data
work-flow(s). Due to the modular nature of the project, components can be
substituted as the nature and/or characteristics of the data work-flow
change over time. A beautiful aspect of OODT is that many tools and
instruments have been built to accommodate the above-mentioned requirements
for data work-flows.

For me, Hadoop (something which I consider a blanket term for what is
essentially an OS) is an operating system as oppose to OODT which I've
described as a modularized data workflow platform. It provides a filesystem
(HDFS), data processing platform (MapReduce), and API through which we can
submit and execute jobs. Additionally we all know about the bolt on's such
as workflow monitoring, security and so forth. In this respect it is down
to the engineer to build the data workflow around/on-top of Hadoop given
the available components provided. One thing which I think characterizes
Hadoop here as well is the fact that generally speaking data follows a
'write-once read many' logic whereas this is not necessarily the case with

> I'd like to document this type of comparison stuff on the Wiki as well as
> I think its useful for people to know and understand.
I'm sure that the above is obvious to many and that I'm merely mentioning
material from the immediate surroundings, however this is my experience so
far using OODT and the comparisons I can draw myself.

When i started responding, it was not my aim to engage in a pro's vs con's
of each piece of software so I hope the brief replay as above can act as a
contribution to the conversation and we can take this onwards.


View raw message