hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hans-Peter Zorn <z...@algo.informatik.tu-darmstadt.de>
Subject Best practice for accessing separate metadata for input files?
Date Wed, 18 Sep 2013 12:16:49 GMT

I have implemented a custom Writable that needs special metadata (a Apache
UIMA type system) to decode the input. This is much more complex metadata
than a simple schema, so I suppose I can't use HCat or similar things. I
would like to store this metadata only once per input file, e.g.

part-00000                              (sequence file)

.part-00000.typesystem.xml (metadata)

What would be the best practice to write and read such metadata from my
Writable? Do I need to implement custom FileFormats, RecordReaders etc or
is there somewhere an API for locating the HDFS FQDN of the file containing
the current input split so I can locate the metadata file that belongs to
it? I also need to create this metadata when output of this kind is written.



View raw message