Return-Path: X-Original-To: apmail-incubator-accumulo-commits-archive@minotaur.apache.org Delivered-To: apmail-incubator-accumulo-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C18819BD5 for ; Wed, 8 Feb 2012 15:37:28 +0000 (UTC) Received: (qmail 83500 invoked by uid 500); 8 Feb 2012 15:37:28 -0000 Delivered-To: apmail-incubator-accumulo-commits-archive@incubator.apache.org Received: (qmail 83466 invoked by uid 500); 8 Feb 2012 15:37:28 -0000 Mailing-List: contact accumulo-commits-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: accumulo-dev@incubator.apache.org Delivered-To: mailing list accumulo-commits@incubator.apache.org Received: (qmail 83459 invoked by uid 99); 8 Feb 2012 15:37:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Feb 2012 15:37:27 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Feb 2012 15:37:25 +0000 Received: from eris.apache.org (localhost [127.0.0.1]) by eris.apache.org (Postfix) with ESMTP id CB7FD23888FD; Wed, 8 Feb 2012 15:37:04 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r1241940 - /incubator/accumulo/branches/1.4/src/examples/wikisearch/ingest/src/main/java/org/apache/accumulo/examples/wikisearch/ingest/WikipediaPartitionedIngester.java Date: Wed, 08 Feb 2012 15:37:04 -0000 To: accumulo-commits@incubator.apache.org From: afuchs@apache.org X-Mailer: svnmailer-1.0.8-patched Message-Id: <20120208153704.CB7FD23888FD@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: afuchs Date: Wed Feb 8 15:37:04 2012 New Revision: 1241940 URL: http://svn.apache.org/viewvc?rev=1241940&view=rev Log: ACCUMULO-375 added compression and increased the minimum split size Modified: incubator/accumulo/branches/1.4/src/examples/wikisearch/ingest/src/main/java/org/apache/accumulo/examples/wikisearch/ingest/WikipediaPartitionedIngester.java Modified: incubator/accumulo/branches/1.4/src/examples/wikisearch/ingest/src/main/java/org/apache/accumulo/examples/wikisearch/ingest/WikipediaPartitionedIngester.java URL: http://svn.apache.org/viewvc/incubator/accumulo/branches/1.4/src/examples/wikisearch/ingest/src/main/java/org/apache/accumulo/examples/wikisearch/ingest/WikipediaPartitionedIngester.java?rev=1241940&r1=1241939&r2=1241940&view=diff ============================================================================== --- incubator/accumulo/branches/1.4/src/examples/wikisearch/ingest/src/main/java/org/apache/accumulo/examples/wikisearch/ingest/WikipediaPartitionedIngester.java (original) +++ incubator/accumulo/branches/1.4/src/examples/wikisearch/ingest/src/main/java/org/apache/accumulo/examples/wikisearch/ingest/WikipediaPartitionedIngester.java Wed Feb 8 15:37:04 2012 @@ -50,6 +50,7 @@ import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.PathFilter; +import org.apache.hadoop.io.SequenceFile.CompressionType; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; @@ -178,6 +179,8 @@ public class WikipediaPartitionedIngeste partitionerJob.setOutputFormatClass(SequenceFileOutputFormat.class); Path outputDir = WikipediaConfiguration.getPartitionedArticlesPath(partitionerConf); SequenceFileOutputFormat.setOutputPath(partitionerJob, outputDir); + SequenceFileOutputFormat.setCompressOutput(partitionerJob, true); + SequenceFileOutputFormat.setOutputCompressionType(partitionerJob, CompressionType.RECORD); return partitionerJob.waitForCompletion(true) ? 0 : 1; } @@ -209,6 +212,7 @@ public class WikipediaPartitionedIngeste // setup input format ingestJob.setInputFormatClass(SequenceFileInputFormat.class); SequenceFileInputFormat.setInputPaths(ingestJob, WikipediaConfiguration.getPartitionedArticlesPath(ingestConf)); + SequenceFileInputFormat.setMinInputSplitSize(ingestJob, 1l << 28); // setup output format ingestJob.setMapOutputKeyClass(Text.class);