hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sonal Goyal <sonalgoy...@gmail.com>
Subject Re: How to manage large record in MapReduce
Date Fri, 07 Jan 2011 08:43:30 GMT
Jerome,

You can take a look at FileStreamInputFormat at
https://github.com/sonalgoyal/hiho/tree/hihoApache0.20/src/co/nubetech/hiho/mapreduce/lib/input

This provides an input stream per file. In our case, we are using the input
stream to load data into the database directly. Maybe you can use this or a
similar approach for working with your videos.

HTH

Thanks and Regards,
Sonal
<https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>





On Thu, Jan 6, 2011 at 4:23 PM, Jérôme Thièvre <jthievre@gmail.com> wrote:

> Hi,
>
> we are currently using Hadoop (version 0.20.2) to manage some web archiving
> processes like fulltext indexing, and it works very well with small records
> that contains html.
> Now, we would like to work with other type of web data like videos. These
> kind of data could be really large and of course these records doesn't fit
> in memory.
>
> Is it possible to manage record which content doesn't reside in memory but
> on disk.
> A possibility would be to implements a Writable that read its content from
> a
> DataInput but doesn't load it in memory, instead it would copy that content
> to a temporary file in the local file system and allows to stream its
> content using an InputStream (an InputStreamWritable).
>
> Has somebody tested a similar approach, and if not do you think some big
> problems could happen (that impacts performance) with this method ?
>
> Thanks,
>
> Jérôme Thièvre
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message