Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 76D1AC70E for ; Wed, 26 Jun 2013 09:58:23 +0000 (UTC) Received: (qmail 26016 invoked by uid 500); 26 Jun 2013 09:58:22 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 25940 invoked by uid 500); 26 Jun 2013 09:58:21 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 25629 invoked by uid 500); 26 Jun 2013 09:58:20 -0000 Delivered-To: apmail-incubator-crunch-dev@incubator.apache.org Received: (qmail 25619 invoked by uid 99); 26 Jun 2013 09:58:20 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Jun 2013 09:58:20 +0000 Date: Wed, 26 Jun 2013 09:58:20 +0000 (UTC) From: "Florian Laws (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CRUNCH-227) Write to sequence file ignores destination path. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CRUNCH-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13693879#comment-13693879 ] Florian Laws commented on CRUNCH-227: ------------------------------------- Hi Josh, the test in the patch passes on my machine. > Write to sequence file ignores destination path. > ------------------------------------------------ > > Key: CRUNCH-227 > URL: https://issues.apache.org/jira/browse/CRUNCH-227 > Project: Crunch > Issue Type: Bug > Components: IO > Affects Versions: 0.6.0, 0.7.0 > Environment: Hadoop 1.0.3 > Reporter: Florian Laws > Attachments: CRUNCH-227.patch > > > I'm trying to write a simple Crunch job that outputs a sequence file consisting of a custom Writable. > The job runs successfully, but the output is not written to the path that I specify in To.sequenceFile(), but instead to a Crunch working directory. > This happens when running the job both locally and on my 1-node Hadoop > test cluster, and it happens both with Crunch 0.6.0 and 0.7.0-SNAPSHOT as of today (38a97e5). > When using pipeline.done() instead of pipeline.run(), the Crunch working directory gets removed after execution, in that case, the output is not retained at all. > Code snippet: > --- > public int run(String[] args) throws IOException { > CommandLine cl = parseCommandLine(args); > Path output = new Path((String) cl.getValue(OUTPUT_OPTION)); > int docIdIndex = getColumnIndex(cl, "DocID"); > int ldaIndex = getColumnIndex(cl, "LDA"); > Pipeline pipeline = new MRPipeline(DbDumpToSeqFile.class); > pipeline.setConfiguration(getConf()); > PCollection lines = pipeline.readTextFile((String) > cl.getValue(INPUT_OPTION)); > PTable vectors = lines.parallelDo( > new ConvertToSeqFileDoFn(docIdIndex, ldaIndex), > tableOf(strings(), writables(NamedQuantizedVecWritable.class))); > vectors.write(To.sequenceFile(output)); > PipelineResult res = pipeline.run(); > return res.succeeded() ? 0 : 1; > } > --- > Log output from local run. > Note how the intended output path "/tmp/foo.seq" is reported in the > execution plan, > is not actually used. > --- > 2013-06-25 16:19:44.250 java[10755:1203] Unable to load realm info > from SCDynamicStore > 2013-06-25 16:19:44 HadoopUtil:185 [INFO] Deleting /tmp/foo.seq > 2013-06-25 16:19:44 FileTargetImpl:224 [INFO] Will write output files > to new path: /tmp/foo.seq > 2013-06-25 16:19:45 JobClient:741 [WARN] No job jar file set. User > classes may not be found. See JobConf(Class) or > JobConf#setJar(String). > 2013-06-25 16:19:45 FileInputFormat:237 [INFO] Total input paths to process : 1 > 2013-06-25 16:19:45 TrackerDistributedCacheManager:407 [INFO] Creating > MAP in /tmp/hadoop-florian/mapred/local/archive/4100035173370108016_-456151549_2075417214/file/tmp/crunch-1128974463/p1-work--1596891011522800122 > with rwxr-xr-x > 2013-06-25 16:19:45 TrackerDistributedCacheManager:447 [INFO] Cached > /tmp/crunch-1128974463/p1/MAP as > /tmp/hadoop-florian/mapred/local/archive/4100035173370108016_-456151549_2075417214/file/tmp/crunch-1128974463/p1/MAP > 2013-06-25 16:19:45 TrackerDistributedCacheManager:470 [INFO] Cached > /tmp/crunch-1128974463/p1/MAP as > /tmp/hadoop-florian/mapred/local/archive/4100035173370108016_-456151549_2075417214/file/tmp/crunch-1128974463/p1/MAP > 2013-06-25 16:19:45 CrunchControlledJob:303 [INFO] Running job > "com.issuu.mahout.utils.DbDumpToSeqFile: > Text(/Users/florian/data/docdb.first20.txt)+S0+SeqFile(/tmp/foo.seq)" > 2013-06-25 16:19:45 CrunchControlledJob:304 [INFO] Job status > available at: http://localhost:8080/ > 2013-06-25 16:19:45 Task:792 [INFO] Task:attempt_local_0001_m_000000_0 > is done. And is in the process of commiting > 2013-06-25 16:19:45 LocalJobRunner:321 [INFO] > 2013-06-25 16:19:45 Task:945 [INFO] Task attempt_local_0001_m_000000_0 > is allowed to commit now > 2013-06-25 16:19:45 FileOutputCommitter:173 [INFO] Saved output of > task 'attempt_local_0001_m_000000_0' to > /tmp/crunch-1128974463/p1/output > 2013-06-25 16:19:48 LocalJobRunner:321 [INFO] > 2013-06-25 16:19:48 Task:904 [INFO] Task 'attempt_local_0001_m_000000_0' done. > --- > This crude patch makes the output end up at the right place, > but breaks a lot of other tests. > --- > --- a/crunch-core/src/main/java/org/apache/crunch/io/impl/FileTargetImpl.java > +++ b/crunch-core/src/main/java/org/apache/crunch/io/impl/FileTargetImpl.java > @@ -66,7 +66,7 @@ public class FileTargetImpl implements PathTarget { > protected void configureForMapReduce(Job job, Class keyClass, Class > valueClass, > Class outputFormatClass, Path outputPath, String name) { > try { > - FileOutputFormat.setOutputPath(job, outputPath); > + FileOutputFormat.setOutputPath(job, path); > } catch (Exception e) { > throw new RuntimeException(e); > } > --- -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira