hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Hagelberg <p...@hagelb.org>
Subject Re-using output directories
Date Tue, 18 Aug 2009 00:25:28 GMT

I'm trying to write a Hadoop job that will add documents to an existing
lucene index. My initial idea was to set the index as the output
directory and create and IndexWriter based on
FileOutputFormat.getOutputPath(context), but this requires that the
output path not exist when the job begins. I also had the idea to use
the job's working directory instead, but it appears the job _must_ be
configured with an output path; it can't be left unset.

I'm thinking the answer would be to set it to a bogus tempfile and
delete that, but that seems awful hacky. There's got to be a better way
to handle this, right?


View raw message