Return-Path: X-Original-To: apmail-incubator-crunch-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-crunch-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D64ECDF9C for ; Tue, 10 Jul 2012 01:24:35 +0000 (UTC) Received: (qmail 14826 invoked by uid 500); 10 Jul 2012 01:24:35 -0000 Delivered-To: apmail-incubator-crunch-dev-archive@incubator.apache.org Received: (qmail 14798 invoked by uid 500); 10 Jul 2012 01:24:35 -0000 Mailing-List: contact crunch-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: crunch-dev@incubator.apache.org Delivered-To: mailing list crunch-dev@incubator.apache.org Received: (qmail 14784 invoked by uid 99); 10 Jul 2012 01:24:35 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jul 2012 01:24:35 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id 94CCD141826 for ; Tue, 10 Jul 2012 01:24:34 +0000 (UTC) Date: Tue, 10 Jul 2012 01:24:33 +0000 (UTC) From: "Kiyan Ahmadizadeh (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: <269865865.26200.1341883474611.JavaMail.jiratomcat@issues-vm> In-Reply-To: <1819598569.25686.1341873215005.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Updated] (CRUNCH-9) Add support for launching Scrunch pipelines from a REPL MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CRUNCH-9?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kiyan Ahmadizadeh updated CRUNCH-9: ----------------------------------- Attachment: CRUNCH-9.patch This commit modifies the Scrunch project so that Scrunch jobs can be run from a Scala REPL. Users can run a Scala REPL capable of launching Scrunch jobs by building Scrunch using `mvn package` and running bin/scrunch from the distribution directory that results. Several changes have been made to the project to accomplish this: 1. The project has been modified to produce a release distribution. The distribution is created by maven when `mvn package` is run. A distribution folder and tarball are created. The distribution folder contains a bin dir that contains scripts, a lib dir that contains all library jars, and a log dir that contains a log4j configuration file. 2. A modified Scala REPL was added to the project. An object InterpreterRunner was created that launches a Scala REPL. It's a modification of Scala's MainGenericRunner. The new Scrunch version allows client code to determine if a REPL is actually running, and includes methods for creating a jar from the code compiled from REPL input. A script named "scrunch" was added to the project that, when run, launches this modified Scala REPL. The script is a modification of the script distributed with Scala that launches the Scala REPL. 3. Scrunch's Pipeline class was modified so that any MapReduce pipeline constructed automatically adds the Scrunch lib jars to the Distributed Cache of the job and to the classpaths of run tasks. 4. Methods on PCollection/PTable/etc. that result in a job being launched were modified to check if the REPL is running and, if so, create a jar of code compiled from REPL input and ship that jar with the job so that it's on the classpath of run tasks. 5. To facilitate extensions, From/To/At objects were changed to traits, with likewise named singleton objects that extend the traits created. 6. The examples in the examples directory, and the script scrunch.py for running those examples, are included in the project distribution. The scrunch.py script was renamed to scrunch-job.py and modified to cope with the new project distribution structure and take advantage of the fact that Scrunch lib jars are now automatically added to the classpath of run jobs. I started an integration test for actually launching jobs but the MiniMRCluster testing framework does not behave properly when jars are added to the distributed cache. The problem is related to MAPREDUCE-2884. I have verified that jobs can be launched from the REPL using an actual cluster. > Add support for launching Scrunch pipelines from a REPL > ------------------------------------------------------- > > Key: CRUNCH-9 > URL: https://issues.apache.org/jira/browse/CRUNCH-9 > Project: Crunch > Issue Type: New Feature > Components: Scrunch > Reporter: Josh Wills > Attachments: CRUNCH-9.patch > > > It would be really, really cool and useful to be able to launch a Scrunch pipeline from a Scala-based REPL, which was one of the killer apps for Cascade, Google's Scala-based wrapper around FlumeJava. > See the video from Scala Days 2011 for a reference: http://days2011.scala-lang.org/node/138/282 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira