hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taylor, Ronald C" <ronald.tay...@pnl.gov>
Subject Is there Hadoop support for parallel reading of a compressed video file - say as *.tar or *.zip file?
Date Wed, 24 Feb 2010 21:54:25 GMT

Hello Hadoop list,

We need to process huge files of video quickly in Hadoop. Think, for example, of a 100 Gb
YouTube video file being uploaded to our cluster  in *.tar or *.zip format. We need a reader,
I presume some variant on LineRecordReader, that can automatically split the file appropriately
between a set of Mappers, one Mapper per node.

I did a quick Google search and found something about work being done on a BZip2 compressed
text file reader - the email mentions this work as issue HADOOP-4012. Could anybody tell me
the state of such work, or of similar work for *.zip and *.tar files? Is there any working
code available?

Also: more generally, we need Hadoop-based code that can automatically split (compressed)
video files at appropriate boundaries for processing in parallel. Our group would deeply appreciate
any guidance pointing toward current Hadoop work in this area.

    Ron Taylor

Ronald Taylor, Ph.D.
Computational Biology & Bioinformatics Group
Pacific Northwest National Laboratory
902 Battelle Boulevard
P.O. Box 999, Mail Stop J4-33
Richland, WA  99352 USA
Office:  509-372-6568
Email: ronald.taylor@pnl.gov

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message