hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning" <ted.dunn...@gmail.com>
Subject Re: Hadoop for real time
Date Mon, 20 Oct 2008 13:19:31 GMT
Hadoop may not be quite what you want for this.

You could definitely use Hadopo for storage and streaming.  You can also do
various kinds of processing on hadoop.

But because Hadoop is primarily intended for batch style operations, there
is a bit of an assumption that some administrative tasks will take down the
cluster.  That may be a problem (video serving tends to have a web audience
that isn't very tolerant of downtime).

At Veoh, we used a simpler, but simpler system for serving videos that was
originally based on Mogile.  The basic idea is that there is a database that
contains name to URL mappings.  The URL's point to storage boxes that have a
bunch of disks that are served out to the net via LightHttpd.  A management
machine runs occasionally to make sure that files are replicated according
to policy.  The database is made redundant via conventional mechanisms.
Requests for files can be proxied a farm of front end machines that query
the database for locations or you can use redirects directly to the
content.  How you do it depends on network topology and your sensitivity
about divulging internal details.  Redirects can give higher peak read speed
since you are going direct.  Proxying avoids a network round trip for the

At Veoh, this system fed the content delivery networks as a caching layer
which meant that the traffic was essentially uniform random access.  This
system handled a huge number of files (10^9 or so) very easily and has
essentially never had customer visible downtime.  Extension with new files
systems is trivial (just tell the manager box and it starts using them).

This arrangement lacks most of the things that make Hadoop really good for
what it does.  But, in return, it is incredibly simple.  It isn't very
suitable for map-reduce or other high bandwidth processing tasks.  It
doesn't allow computation to go to the data.  It doesn't allow large files
to be read in parallel from many machines.  On the other hand, it handles
way more files than Hadoop does and it handles gobs of tiny files pretty

Video is also kind of a write-once medium in many cases and video files
aren't real splittable for map-reduce purposes.  That might mean that you
could get away with a mogile-ish system.

On Tue, Oct 14, 2008 at 1:29 PM, Stas Oskin <stas.oskin@gmail.com> wrote:

> Hi.
> Video storage, processing and streaming.
> Regards.
> 2008/9/25 Edward J. Yoon <edwardyoon@apache.org>
> > What kind of the real-time app?
> >
> > On Wed, Sep 24, 2008 at 4:50 AM, Stas Oskin <stas.oskin@gmail.com>
> wrote:
> > > Hi.
> > >
> > > Is it possible to use Hadoop for real-time app, in video processing
> > field?
> > >
> > > Regards.
> > >
> >
> > --
> > Best regards, Edward J. Yoon
> > edwardyoon@apache.org
> > http://blog.udanax.org
> >


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message