hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: Generating an Index for sequence files
Date Mon, 04 Oct 2010 09:19:53 GMT
No it does not mean you can't use them in map reduce operations, and they
are especially built with that in mind.

InputFormat generally wraps over a simple Reader class (of the file format
it is of). Its not difficult to write one. Considering your specific
requirements of reading files, you may also find it better to write your own
input format classes for TFiles.

MapFiles are essentially SequenceFiles as already explained and can be used
with the same's IF class in map reduce operations. For fine tuned reading of
a map file, you will need your own impl, which isn't hard to do either.
Hadoop is very modular at the IO level.

Please look into the Reader and Writer class impls or API of each of the
file format you are interested in, then writing an input format class should
be doable enough.

On Oct 4, 2010 2:34 PM, "Sina Samangooei" <ss@ecs.soton.ac.uk> wrote:


Thanks for the Quick response.

It's good that there are provisions being made for the kind of problem i'm
trying to solve. However, I can't seem to find any sort of TFileInputFormat
or MapFileInputFormat. Does this mean TFiles and MapFiles can't be
simultaneously used for random access as well as map reduce tasks?

If this is the case TFiles and MapFiles are not suitable for my purposes. I
require the ability to perform large scale map-reduce operations on ALL of
the files, while at the same time having the ability to quickly access an
individual file. Two separate use cases, but both quite important. An option
might be to duplicate the data? Literally hold two copies, but that just
doesn't sit right.

Therefore, for now at least, i will continue with my index generation
scheme, i think i've found a work around that involves generating the index
outside of hadoop (i.e. not through a map-reduce task). This is slightly
slower than generating the index as part of a map reduce task, but once
generated the index should make access of files and various other operations
much faster

Thanks again,

- Sina

On 2 Oct 2010, at 17:36, Owen O'Malley wrote:

> On Sat, Oct 2, 2010 at 5:25 AM, Harsh J <qwertyman...

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message