hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sonal Goyal <sonalgoy...@gmail.com>
Subject Re: How to manage large record in MapReduce
Date Fri, 07 Jan 2011 08:43:30 GMT

You can take a look at FileStreamInputFormat at

This provides an input stream per file. In our case, we are using the input
stream to load data into the database directly. Maybe you can use this or a
similar approach for working with your videos.


Thanks and Regards,
<https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
Nube Technologies <http://www.nubetech.co>


On Thu, Jan 6, 2011 at 4:23 PM, Jérôme Thièvre <jthievre@gmail.com> wrote:

> Hi,
> we are currently using Hadoop (version 0.20.2) to manage some web archiving
> processes like fulltext indexing, and it works very well with small records
> that contains html.
> Now, we would like to work with other type of web data like videos. These
> kind of data could be really large and of course these records doesn't fit
> in memory.
> Is it possible to manage record which content doesn't reside in memory but
> on disk.
> A possibility would be to implements a Writable that read its content from
> a
> DataInput but doesn't load it in memory, instead it would copy that content
> to a temporary file in the local file system and allows to stream its
> content using an InputStream (an InputStreamWritable).
> Has somebody tested a similar approach, and if not do you think some big
> problems could happen (that impacts performance) with this method ?
> Thanks,
> Jérôme Thièvre

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message