bigtop-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From j..@apache.org
Subject [2/2] git commit: BIGTOP-1269: Migrating from maven to gradle. Some code refactoring. Adding scala support. Removed hive and mahout code/dependencies.
Date Tue, 27 May 2014 12:40:08 GMT
BIGTOP-1269: Migrating from maven to gradle. Some code refactoring. Adding scala support. Removed hive and mahout code/dependencies.

Removed the dependency on the parent project. Formatting of the sources

Replaced deprecated method calls with the corresponding new, non-deprecated ones. General cleanup in tests.

Removed redundant properties. Added missing versions for plugins.

Removal on redundant dependencies from the pig profile. Removal of the provided-scope for logging dependencies. Unless we ar running in an environment where these depdendecies are automatically provided, we can't really use provided-scope here.

More cleanup. Removal of most dependencies from the hive profile since they were specified in the main dependency section of the pom anyway.

More Cleanup. Moved the dependencies from the mahout profile to the main dependencies section. WIP.

Basic gradle build file. WIP. Need to include the integration-test src dir and the conditional exclusion of tests used for each file

Moved the dependency to the main section.

Added maven-local repo. Formatting.

WIP. IntegrationTests added. Need to set the dependency between the integration-test and test

Updated to configure some common config for tests and the exclusing based on the passed-in parameter configured correctly

Created seperate tasks for each integration-test type. All share the same config.

Moved common configuration for integration-tests out of the individual task configs. Organized the build file

Removed all the lint warnings.

Added eclipse configuration

Modified to document the commands for gradle driven build system

Added scala to the build.gradle. Moved the integration-tests to the src/integrationTest folder from src/integration folder so that gradle can automatically pick up the source-code classes. Otherwise, we'd need to specify those classes ourselves.

Added a sample scala test. Trying to add the scala-test library now

Added scalatest library for writing tests that look like natural language. We can test both java and scala code using this.

Changed the package name to be consistent with other classes

Setting the groupId and version using the pom.xml from the parent bigtop project.

Added some documentation to the gradle.build, and README

Added support for overriding dependencies for individual tasks. Updated documentation based on it.

Removed trailing whitespaces. Replaced spaces with tabs

removed a debug message

Deleted mahout and hive related code and the dependencies.

Signed-off-by: Jay Vyas <bigpetstore@Jays-MacBook-Air.local>


Project: http://git-wip-us.apache.org/repos/asf/bigtop/repo
Commit: http://git-wip-us.apache.org/repos/asf/bigtop/commit/1a851e4f
Tree: http://git-wip-us.apache.org/repos/asf/bigtop/tree/1a851e4f
Diff: http://git-wip-us.apache.org/repos/asf/bigtop/diff/1a851e4f

Branch: refs/heads/master
Commit: 1a851e4ff105e91f3b2b1c47294c9b103b046e81
Parents: 984832e
Author: bhashit parikh <bhashit.parikh@gmail.com>
Authored: Wed May 21 17:23:27 2014 +0530
Committer: Jay Vyas <bigpetstore@Jays-MacBook-Air.local>
Committed: Tue May 27 08:39:33 2014 -0400

----------------------------------------------------------------------
 bigtop-bigpetstore/README.md                    |  54 ++-
 bigtop-bigpetstore/build.gradle                 | 210 +++++++++
 bigtop-bigpetstore/pom.xml                      | 449 +++++--------------
 bigtop-bigpetstore/settings.gradle              |   1 +
 bigtop-bigpetstore/setuphive.sh                 |  22 -
 .../bigtop/bigpetstore/BigPetStoreHiveIT.java   | 108 -----
 .../bigtop/bigpetstore/BigPetStoreMahoutIT.java |  88 ----
 .../bigtop/bigpetstore/BigPetStorePigIT.java    | 165 -------
 .../org/apache/bigtop/bigpetstore/ITUtils.java  | 145 ------
 .../bigtop/bigpetstore/BigPetStorePigIT.java    | 148 ++++++
 .../org/apache/bigtop/bigpetstore/ITUtils.java  | 134 ++++++
 .../bigpetstore/clustering/BPSRecommnder.java   |  83 ----
 .../bigtop/bigpetstore/etl/HiveViewCreator.java | 157 -------
 .../src/main/resources/hive-log4j.properties    |  84 ----
 .../src/main/resources/hive-site.xml            |  36 --
 .../bigtop/bigpetstore/docs/TestDocs.java       |  29 +-
 .../generator/TestNumericalIdUtils.java         |   5 +-
 .../TestPetStoreTransactionGeneratorJob.java    |  14 +-
 .../bigtop/bigpetstore/ScalaTestSample.scala    |  18 +
 19 files changed, 664 insertions(+), 1286 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/bigtop/blob/1a851e4f/bigtop-bigpetstore/README.md
----------------------------------------------------------------------
diff --git a/bigtop-bigpetstore/README.md b/bigtop-bigpetstore/README.md
index 95245a8..40a9088 100644
--- a/bigtop-bigpetstore/README.md
+++ b/bigtop-bigpetstore/README.md
@@ -3,8 +3,6 @@
 BigPetStore
 ============
 
-test mvn deploy1
-
 Apache Bigtop/Hadoop Ecosystem Demo
 -----------------------------------
 This software is created to demonstrate Apache Bigtop for processing
@@ -15,43 +13,42 @@ Architecture
 The application consists of the following modules
 
 * generator: generates raw data on the dfs
-* clustering: Apache Mahout demo code for processing the data using Itembased Collaborative Filtering
+* clustering: Apache Mahout demo code for processing the data using Item based Collaborative Filtering. This feature is not supported yet. You can track its progress using this [`JIRA` issue](https://issues.apache.org/jira/browse/BIGTOP-1272)
 * Pig: demo code for processing the data using Apache Pig
-* Hive: demo code for processing the data using Apache Hive demo code
+* Hive: demo code for processing the data using Apache Hive. This part is not complete yet. We are working on it. You can track it using this [`JIRA` issue](https://issues.apache.org/jira/browse/BIGTOP-1270)
 * Crunch: demo code for processing the data using Apache Crunch
 
 Build Instructions
 ------------------
 
-* BUILD THE JAR
+You'll need to have [`gradle`](http://www.gradle.org/downloads) installed and set-up correctly in order to follow along these instructions.
+We could have used the [`gradle-wrapper`](http://www.gradle.org/docs/current/userguide/gradle_wrapper.html) to avoid having to install `gradle`, but the `bigtop` project includes all `gradle*` directories in `.gitignore`. So, that's not going to work.
+
+### Build the JAR
 
-  "mvn clean package" will build the bigpetstore jar
+  `gradle clean build` will build the bigpetstore `jar`. The `jar` will be located in the `build\libs` directory.
 
-* Run Intergration tests with
+### Run Intergration Tests With
+  * Pig profile: `gradle clean integrationTest -P ITProfile=pig`
+  * Crunch profile: `gradle clean integrationTest -P ITProfile=crunch`
+  * Hive profile: Not implemented yet.
+  * Mahout profile: Not implemented yet.
 
-  * Pig profile: mvn clean verify -P pig
-  * Crunch profile: mvn clean verify -P crunch
-  * Hive provile:
-     * First, see and run the setuphive.sh script.  Read it and try to under
-     stand what it does.
+If you don't specify any profile-name, or if you specify an invalid-name for the `integrationTest` task, no integration tests will be run.
 
-     * mvn clean verify -P pig
+*Note:* At this stage, only the `Pig` profile is working. Will continue to update this area as further work is completed.
 
 For Eclipse Users
 -----------------
 
-1) run "mvn eclipse:eclipse" to create an IDE loadable project.
-
-2) open .classpath and add
-    `<classpathentry kind="src" path="src/integration/java" including="**/*.java"/>`
-
-3) import the project into eclipse
+1. Run `gradle eclipse` to create an eclipse project.
+2. Import the project into eclipse.
 
+*Note* whenever you modify the dependencies, you will need to run the `gradle eclipse` again. Refresh the project after doing so. You'd also need to have the `scala` plugin installed. Also, having a `gradle` plugin would be quite useful as well, for ex. when you want to update dependencies.
 
 High level summary
 ------------------
 
-
 The bigpetstore project exemplifies the hadoop ecosystem for newcomers, and also for benchmarking and
 comparing functional space of tools.
 
@@ -63,7 +60,7 @@ using a common framework and easily understood use case
 How it works (To Do)
 --------------------
 
-* Phase 1: Generating pet store data:
+### Phase 1: Generating pet store data:
 
 The first step is to generate a raw data set.  This is done by the "GeneratePetStoreTransactionsInputFormat":
 
@@ -72,22 +69,21 @@ its output.  The result is a list of "transactions".  Each transaction is a tupl
 
   *{state,name,date,price,product}.*
 
-* Phase 2: Processing the data
+### Phase 2: Processing the data
 
-The next phase of the application processes the data to create basic aggregations.
-For example with both pig and hive these could easily include
+The next phase of the application processes the data to create basic aggregations. For example with both pig and hive these could easily include
 
-  *Number of transactions by state* or
-  *Most valuable customer by state* or
-  *Most popular items by state*
+- *Number of transactions by state* or
+- *Most valuable customer by state* or
+- *Most popular items by state*
 
 
-* Phase 3: Clustering the states by all fields
+### Phase 3: Clustering the states by all fields
 
   Now, say we want to cluster the states, so as to put different states into different buying categories
   for our marketing team to deal with differently.
 
-* Phase 4: Visualizing the Data in D3.
+### Phase 4: Visualizing the Data in D3.
 
  - try it [on the gh-pages branch](http://jayunit100.github.io/bigpetstore/)
 

http://git-wip-us.apache.org/repos/asf/bigtop/blob/1a851e4f/bigtop-bigpetstore/build.gradle
----------------------------------------------------------------------
diff --git a/bigtop-bigpetstore/build.gradle b/bigtop-bigpetstore/build.gradle
new file mode 100644
index 0000000..efb69b3
--- /dev/null
+++ b/bigtop-bigpetstore/build.gradle
@@ -0,0 +1,210 @@
+apply plugin: "java"
+apply plugin: "eclipse"
+// TODO add idea module config.
+apply plugin: "idea"
+apply plugin: "scala"
+
+// Read the groupId and version properties from the "parent" bigtop project.
+// It would be better if there was some better way of doing this. Howvever,
+// at this point, we have to do this (or some variation thereof) since gradle
+// projects can't have maven projects as parents (AFAIK. If there is a way to do it,
+// it doesn't seem to be well-documented).
+def setProjectProperties() {
+	Node xml = new XmlParser().parse("../pom.xml")
+	group = xml.groupId.first().value().first()
+	version = xml.version.first().value().first()
+}
+
+setProjectProperties()
+description = """"""
+
+// We are using 1.7 as gradle can't play well when java 8 and scala are combined.
+// There is an open issue here: http://issues.gradle.org/browse/GRADLE-3023
+// There is talk of this being resolved in the next version of gradle. Till then,
+// we are stuck with java 7. But we do have scala if we want more syntactic sugar.
+sourceCompatibility = 1.7
+targetCompatibility = 1.7
+
+// Specify any additional project properties.
+ext {
+	slf4jVersion = "1.7.5"
+	guavaVersion = "15.0"
+	hadoopVersion = "2.2.0"
+	datanucleusVersion = "3.2.2"
+	datanucleusJpaVersion = "3.2.1"
+	bonecpVersion = "0.8.0.RELEASE"
+	derbyVersion = "10.10.1.1"
+}
+
+repositories {
+	mavenCentral()
+}
+
+tasks.withType(Compile) {
+	options.encoding = 'UTF-8'
+	options.compilerArgs << "-Xlint:all"
+}
+
+tasks.withType(ScalaCompile) {
+	// Enables incremental compilation.
+	// http://www.gradle.org/docs/current/userguide/userguide_single.html#N12F78
+	scalaCompileOptions.useAnt = false
+}
+
+tasks.withType(Test) {
+	testLogging {
+		// Uncomment this if you want to see the console output from the tests.
+		// showStandardStreams = true
+		events "passed", "skipped", "failed"
+	}
+}
+
+test {
+	exclude "**/*TestPig.java", "**/*TestHiveEmbedded.java", "**/*TestCrunch.java", "**/*TestPetStoreTransactionGeneratorJob.java"
+}
+
+// Create a separate source-set for the src/integrationTest set of classes. The convention here
+// is that gradle will look for a directory with the same name as that of the specified source-set
+// under the 'src' directory. So, in this case, it will look for a directory named 'src/integrationTest'
+// since the name of the source-set is 'integrationTest'
+sourceSets {
+	// The main and test source-sets are configured by both java and scala plugins. They contain
+	// all the src/main and src/test classes. The following statements make all of those classes
+	// available on the classpath for the integration-tests, for both java and scala.
+	integrationTest {
+		java {
+			compileClasspath += main.output + test.output
+			runtimeClasspath += main.output + test.output
+		}
+		scala {
+			compileClasspath += main.output + test.output
+			runtimeClasspath += main.output + test.output
+		}
+	}
+}
+
+// Creating a source-set automatically add a couple of corresponding configurations (when java/scala
+// plugins are applied). The convention for these configurations is <sourceSetName>Compile and
+// <sourceSetName>Runtime. The following statements declare that all the dependencies from the
+// testCompile configuration will now be available for integrationTestCompile, and all the
+// dependencies (and other configuration that we might have provided) for testRuntime will be
+// available for integrationTestRuntime. For ex. the testCompile configuration has a dependency on
+// jUnit and scalatest. This makes them available for the integration tests as well.
+configurations {
+	integrationTestCompile {
+		extendsFrom testCompile
+	}
+
+	integrationTestRuntime {
+		extendsFrom integrationTestCompile, testRuntime
+	}
+}
+
+// To see the API that is being used here, consult the following docs
+// http://www.gradle.org/docs/current/dsl/org.gradle.api.artifacts.ResolutionStrategy.html
+def updateDependencyVersion(dependencyDetails, dependencyString) {
+	def parts = dependencyString.split(':')
+	def group = parts[0]
+	def name = parts[1]
+	def version = parts[2]
+	if (dependencyDetails.requested.group == group
+			&& dependencyDetails.requested.name == name) {
+		dependencyDetails.useVersion version
+	}
+}
+
+def setupPigIntegrationTestDependencyVersions(dependencyResolveDetails) {
+	// This is the way we override the dependencies.
+	updateDependencyVersion dependencyResolveDetails, "joda-time:joda-time:2.2"
+}
+
+def setupCrunchIntegrationTestDependencyVersions(dependencyResolveDetails) {
+	// Specify any dependencies that you want to override for crunch integration tests.
+}
+
+task integrationTest(type: Test, dependsOn: test) {
+
+	testClassesDir = sourceSets.integrationTest.output.classesDir
+	classpath = sourceSets.integrationTest.runtimeClasspath
+
+	if(!project.hasProperty('ITProfile')) {
+		// skip integration-tests if no profile has been specified.
+		integrationTest.onlyIf { false }
+		return;
+	}
+
+	def patternsToInclude
+	def dependencyConfigClosure
+	def skipDependencyUpdates = false
+	// Select the pattern for test classes that should be executed, and the dependency
+	// configuration function to be called based on the profile name specified at the command line.
+	switch (project.ITProfile) {
+		case "pig":
+			patternsToInclude = "*PigIT*"
+			dependencyConfigClosure = { setupPigIntegrationTestDependencyVersions(it) }
+			break
+		case "crunch":
+			patternsToInclude = "*CrunchIT*"
+			dependencyConfigClosure = { setupCrunchIntegrationTestDependencyVersions(it) }
+			break
+		// skip integration-tests if the passed in profile-name is not valid
+		default: integrationTest.onlyIf { false }; return
+	}
+
+
+	filter { includeTestsMatching patternsToInclude }
+
+	// This is the standard way gradle allows overriding each specific dependency.
+	// see: http://www.gradle.org/docs/current/dsl/org.gradle.api.artifacts.ResolutionStrategy.html
+	project.configurations.all {
+		resolutionStrategy {
+			eachDependency {
+				dependencyConfigClosure(it)
+			}
+		}
+	}
+}
+
+dependencies {
+	compile "org.kohsuke:graphviz-api:1.0"
+	compile "org.apache.crunch:crunch-core:0.9.0-hadoop2"
+	compile "com.jolbox:bonecp:${project.bonecpVersion}"
+	compile "org.apache.derby:derby:${project.derbyVersion}"
+	compile "com.google.guava:guava:${project.guavaVersion}"
+	compile "commons-lang:commons-lang:2.6"
+	compile "joda-time:joda-time:2.3"
+	compile "org.apache.commons:commons-lang3:3.1"
+	compile "com.google.protobuf:protobuf-java:2.5.0"
+	compile "commons-logging:commons-logging:1.1.3"
+	compile "com.thoughtworks.xstream:xstream:+"
+	compile "org.apache.lucene:lucene-core:+"
+	compile "org.apache.lucene:lucene-analyzers-common:+"
+	compile "org.apache.solr:solr-commons-csv:3.5.0"
+	compile "org.apache.hadoop:hadoop-client:${project.hadoopVersion}"
+	compile group: "org.apache.pig", name: "pig", version: "0.12.0", classifier:"h2"
+	compile "org.slf4j:slf4j-api:${project.slf4jVersion}"
+	compile "log4j:log4j:1.2.12"
+	compile "org.slf4j:slf4j-log4j12:${project.slf4jVersion}"
+	compile "org.datanucleus:datanucleus-core:${project.datanucleusVersion}"
+	compile "org.datanucleus:datanucleus-rdbms:${project.datanucleusJpaVersion}"
+	compile "org.datanucleus:datanucleus-api-jdo:${project.datanucleusJpaVersion}"
+	compile "org.datanucleus:datanucleus-accessplatform-jdo-rdbms:${project.datanucleusJpaVersion}"
+	compile group: "org.apache.mrunit", name: "mrunit", version: "1.0.0", classifier:"hadoop2"
+
+	compile 'org.scala-lang:scala-library:2.10.0'
+
+	testCompile "junit:junit:4.11"
+	testCompile "org.hamcrest:hamcrest-all:1.3"
+	testCompile "org.scalatest:scalatest_2.10:2.1.7"
+}
+
+eclipse {
+	classpath {
+		// Add the sependencies and the src dirs for the integrationTest source-set to the
+		// .classpath file that will be generated by the eclipse plugin.
+		plusConfigurations += configurations.integrationTestCompile
+		// Uncomment the following two lines if you want to generate an eclipse project quickly.
+		downloadSources = false
+		downloadJavadoc = false
+	}
+}

http://git-wip-us.apache.org/repos/asf/bigtop/blob/1a851e4f/bigtop-bigpetstore/pom.xml
----------------------------------------------------------------------
diff --git a/bigtop-bigpetstore/pom.xml b/bigtop-bigpetstore/pom.xml
index 0bc226e..44584eb 100644
--- a/bigtop-bigpetstore/pom.xml
+++ b/bigtop-bigpetstore/pom.xml
@@ -2,16 +2,9 @@
 <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
 	<modelVersion>4.0.0</modelVersion>
-
-        <parent>
-           <groupId>org.apache.bigtop</groupId>
-           <artifactId>bigtop</artifactId>
-           <version>0.8.0-SNAPSHOT</version>
-           <relativePath>../pom.xml</relativePath>
-        </parent>
-
+	<groupId>org.apache.bigtop</groupId>
 	<artifactId>BigPetStore</artifactId>
-
+	<version>0.8.0-SNAPSHOT</version>
 	<properties>
 		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
 		<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
@@ -20,24 +13,21 @@
 		<slf4j.version>1.7.5</slf4j.version>
 		<guava.version>15.0</guava.version>
 		<hadoop.version>2.2.0</hadoop.version>
-		<derby.version>10.8.1.2</derby.version>
 		<hive.version>0.12.0</hive.version>
 		<datanucleus.version>3.2.2</datanucleus.version>
 		<datanucleus.jpa.version>3.2.1</datanucleus.jpa.version>
 		<bonecp.version>0.8.0.RELEASE</bonecp.version>
 		<derby.version>10.10.1.1</derby.version>
+		<plugin.surefire.version>2.17</plugin.surefire.version>
 	</properties>
 
 	<dependencies>
-
 		<dependency>
 			<groupId>org.kohsuke</groupId>
 			<artifactId>graphviz-api</artifactId>
 			<version>1.0</version>
 		</dependency>
 
-		<!-- CRUNCH : These are repeated in the profile and necessary for compilation
-			even without the profile -->
 		<dependency>
 			<groupId>org.apache.crunch</groupId>
 			<artifactId>crunch-core</artifactId>
@@ -56,37 +46,109 @@
 			<artifactId>derby</artifactId>
 			<version>${derby.version}</version>
 		</dependency>
-		<!-- <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-lang3</artifactId>
-			<version>3.1</version> </dependency> -->
 
 		<dependency>
 			<groupId>com.google.guava</groupId>
 			<artifactId>guava</artifactId>
-			<version>15.0</version>
+			<version>${guava.version}</version>
+		</dependency>
+
+		<!-- From pig profile -->
+		<dependency>
+			<groupId>commons-lang</groupId>
+			<artifactId>commons-lang</artifactId>
+			<version>2.6</version>
 		</dependency>
 
-		<!--
-		  We keep this at top level so that mvn eclipse:eclipse creates  a nice 
-		  tidy project, but its  a little messy.  later we'll create a profile for 
-		  eclipse and move this (and other deps) into profiles as needed.
-		  Important: Remove this dependency when running hive integration tests...
-		-->		
-		  <dependency>
+		<dependency>
+			<groupId>joda-time</groupId>
+			<artifactId>joda-time</artifactId>
+			<version>2.3</version>
+		</dependency>
+		<!-- end pig profile -->
+		<!-- From hive profile -->
+		<dependency>
+			<groupId>org.apache.commons</groupId>
+			<artifactId>commons-lang3</artifactId>
+			<version>3.1</version>
+		</dependency>
+		<!-- end hive profile -->
+		<!-- From Crunch profile -->
+		<dependency>
+			<groupId>com.google.protobuf</groupId>
+			<artifactId>protobuf-java</artifactId>
+			<version>2.5.0</version>
+		</dependency>
+		<!-- end crunch profile -->
+		<!-- From Mahout profile -->
+		<dependency>
+			<groupId>commons-logging</groupId>
+			<artifactId>commons-logging</artifactId>
+			<version>1.1.3</version>
+		</dependency>
+		<dependency>
+			<groupId>org.apache.mahout</groupId>
+			<artifactId>mahout-math</artifactId>
+			<version>0.9</version>
+		</dependency>
+		<dependency>
+			<groupId>com.thoughtworks.xstream</groupId>
+			<artifactId>xstream</artifactId>
+			<version>LATEST</version>
+		</dependency>
+		<dependency>
+			<groupId>org.apache.lucene</groupId>
+			<artifactId>lucene-core</artifactId>
+			<version>LATEST</version>
+		</dependency>
+		<dependency>
+			<groupId>org.apache.lucene</groupId>
+			<artifactId>lucene-analyzers-common</artifactId>
+			<version>LATEST</version>
+		</dependency>
+		<dependency>
+			<groupId>org.apache.mahout.commons</groupId>
+			<artifactId>commons-cli</artifactId>
+			<version>LATEST</version>
+		</dependency>
+		<dependency>
+			<groupId>org.apache.commons</groupId>
+			<artifactId>commons-math3</artifactId>
+			<version>LATEST</version>
+		</dependency>
+		<dependency>
+			<groupId>org.apache.solr</groupId>
+			<artifactId>solr-commons-csv</artifactId>
+			<version>3.5.0</version>
+		</dependency>
+		<!-- end Mahout profile -->
+
+		<!-- TODO ask question about this comment -->
+		<!-- We keep this at top level so that mvn eclipse:eclipse creates a nice 
+			tidy project, but its a little messy. later we'll create a profile for eclipse 
+			and move this (and other deps) into profiles as needed. Important: Remove 
+			this dependency when running hive integration tests... -->
+		<dependency>
 			<groupId>org.apache.hadoop</groupId>
 			<artifactId>hadoop-client</artifactId>
 			<version>${hadoop.version}</version>
-		  </dependency>
-               <!-- mahout deps : may need to turn these on/off when testing mahout locally-->
-		
-		<dependency> <groupId>org.apache.mahout</groupId> <artifactId>mahout-core</artifactId>
-			<version>0.9</version> <exclusions> </exclusions> </dependency>
+		</dependency>
+		<!-- TODO ask question about this comment -->
+		<!-- mahout deps : may need to turn these on/off when testing mahout locally -->
+		<!-- For testing on my machine, I created a bigpetstore mahout jar which 
+			is compiled for 2.2.0 . Or substitute this with the standard apache mahout-core 
+			but not sure if it will work. -->
+		<dependency>
+			<groupId>org.apache.mahout</groupId>
+			<artifactId>mahout-core</artifactId>
+			<version>0.8</version>
+		</dependency>
 		<!-- pig deps -->
 		<dependency>
 			<groupId>org.apache.pig</groupId>
 			<artifactId>pig</artifactId>
 			<classifier>h2</classifier>
 			<version>0.12.0</version>
-			<scope>provided</scope>
 		</dependency>
 
 		<!--logging -->
@@ -96,33 +158,26 @@
 			<artifactId>slf4j-api</artifactId>
 			<version>${slf4j.version}</version>
 		</dependency>
-
-		<!-- SL4J Binding provided at runtime -->
 		<dependency>
 			<groupId>log4j</groupId>
 			<artifactId>log4j</artifactId>
 			<version>1.2.12</version>
-			<scope>provided</scope>
 		</dependency>
 		<dependency>
 			<groupId>org.slf4j</groupId>
 			<artifactId>slf4j-log4j12</artifactId>
 			<version>${slf4j.version}</version>
-			<scope>provided</scope>
 		</dependency>
-
 		<!-- hive -->
 		<dependency>
 			<groupId>org.apache.hive</groupId>
 			<artifactId>hive-common</artifactId>
 			<version>${hive.version}</version>
-			<scope>provided</scope>
 		</dependency>
 		<dependency>
 			<groupId>org.apache.hive</groupId>
 			<artifactId>hive-serde</artifactId>
 			<version>${hive.version}</version>
-			<scope>provided</scope>
 		</dependency>
 		<dependency>
 			<groupId>org.apache.hive</groupId>
@@ -154,6 +209,7 @@
 			<version>${datanucleus.jpa.version}</version>
 		</dependency>
 
+		<!-- TODO eliminate this pom dependency -->
 		<dependency>
 			<groupId>org.datanucleus</groupId>
 			<artifactId>datanucleus-accessplatform-jdo-rdbms</artifactId>
@@ -180,7 +236,6 @@
 			<version>1.0.0</version>
 			<classifier>hadoop2</classifier>
 		</dependency>
-
 	</dependencies>
 
 	<build>
@@ -191,7 +246,7 @@
 				<version>3.0.0.RELEASE</version>
 			</extension>
 		</extensions>
-		<finalName>bigpetstore-${version}</finalName>
+		<finalName>bigpetstore-${project.version}</finalName>
 		<plugins>
 			<plugin>
 				<groupId>org.apache.maven.plugins</groupId>
@@ -201,6 +256,7 @@
 			<plugin>
 				<groupId>org.apache.maven.plugins</groupId>
 				<artifactId>maven-eclipse-plugin</artifactId>
+				<version>2.9</version>
 				<configuration>
 					<downloadSources>true</downloadSources>
 					<downloadJavadocs>true</downloadJavadocs>
@@ -212,8 +268,8 @@
 				<artifactId>maven-compiler-plugin</artifactId>
 				<version>2.3.2</version>
 				<configuration>
-					<source>1.6</source>
-					<target>1.6</target>
+					<source>1.8</source>
+					<target>1.8</target>
 				</configuration>
 			</plugin>
 			<plugin>
@@ -227,6 +283,7 @@
 			<plugin>
 				<groupId>org.apache.maven.plugins</groupId>
 				<artifactId>maven-surefire-plugin</artifactId>
+				<version>${plugin.surefire.version}</version>
 				<configuration>
 					<excludes>
 						<exclude>**/*TestPig.java</exclude>
@@ -241,56 +298,13 @@
 	<profiles>
 		<profile>
 			<id>pig</id>
-			<activation>
-				<activeByDefault>false</activeByDefault>
-			</activation>
-			<properties>
-				<skip.unit.tests>false</skip.unit.tests>
-			</properties>
-			<dependencies>
-				<!-- misc -->
-				<dependency>
-					<groupId>org.apache.commons</groupId>
-					<artifactId>commons-lang3</artifactId>
-					<version>3.1</version>
-				</dependency>
-				<dependency>
-					<groupId>joda-time</groupId>
-					<artifactId>joda-time</artifactId>
-					<version>2.3</version>
-				</dependency>
-				<dependency>
-					<groupId>com.google.guava</groupId>
-					<artifactId>guava</artifactId>
-					<version>${guava.version}</version>
-				</dependency>
-
-				<!-- pig -->
-				<dependency>
-					<groupId>org.apache.pig</groupId>
-					<artifactId>pig</artifactId>
-					<classifier>h2</classifier>
-					<version>0.12.0</version>
-					<scope>provided</scope>
-				</dependency>
-
-				<!-- hadoop -->
-				<dependency>
-					<groupId>org.apache.hadoop</groupId>
-					<artifactId>hadoop-client</artifactId>
-					<version>${hadoop.version}</version>
-				</dependency>
-				<!-- <dependency> <groupId>org.apache.mrunit</groupId> <artifactId>mrunit</artifactId>
-					<version>1.0.0</version> <classifier>hadoop2</classifier> </dependency> -->
-			</dependencies>
-
 			<build>
 				<plugins>
 					<plugin>
 						<groupId>org.apache.maven.plugins</groupId>
 						<artifactId>maven-surefire-plugin</artifactId>
+						<version>${plugin.surefire.version}</version>
 						<configuration>
-
 							<excludes>
 								<exclude>**/*TestPig.java</exclude>
 								<exclude>**/*TestHiveEmbedded.java</exclude>
@@ -333,7 +347,7 @@
 							</excludes>
 						</configuration>
 						<executions>
-							<!-- States that both integration-test and verify goals of the Failsafe
+							<!-- States that both integration-test and verify goals of the Failsafe 
 								Maven plugin are executed. -->
 							<execution>
 								<id>integration-tests</id>
@@ -350,33 +364,19 @@
 
 		<profile>
 			<id>hive</id>
-			<activation>
-				<activeByDefault>false</activeByDefault>
-			</activation>
-			<properties>
-				<derby.version>10.8.1.2</derby.version>
-				<hive.version>0.12.0</hive.version>
-				<datanucleus.version>3.2.2</datanucleus.version>
-				<datanucleus.jpa.version>3.2.1</datanucleus.jpa.version>
-				<bonecp.version>0.8.0.RELEASE</bonecp.version>
-				<derby.version>10.10.1.1</derby.version>
-				<skip.unit.tests>false</skip.unit.tests>
-			</properties>
-
 			<build>
 				<plugins>
 					<plugin>
 						<groupId>org.apache.maven.plugins</groupId>
 						<artifactId>maven-surefire-plugin</artifactId>
+						<version>${plugin.surefire.version}</version>
 						<configuration>
-
 							<excludes>
 								<exclude>**/*TestPig.java</exclude>
 								<exclude>**/*TestHiveEmbedded.java</exclude>
 								<exclude>**/*TestCrunch.java</exclude>
 								<exclude>**/*TestPetStoreTransactionGeneratorJob.java</exclude>
 							</excludes>
-
 						</configuration>
 					</plugin>
 					<plugin>
@@ -410,7 +410,7 @@
 							</excludes>
 						</configuration>
 						<executions>
-							<!-- States that both integration-test and verify goals of the Failsafe
+							<!-- States that both integration-test and verify goals of the Failsafe 
 								Maven plugin are executed. -->
 							<execution>
 								<id>integration-tests</id>
@@ -423,150 +423,26 @@
 					</plugin>
 				</plugins>
 			</build>
-
-
 			<dependencies>
-				<!-- misc -->
-				<dependency>
-					<groupId>org.apache.commons</groupId>
-					<artifactId>commons-lang3</artifactId>
-					<version>3.1</version>
-				</dependency>
-
-				<dependency>
-					<groupId>com.google.guava</groupId>
-					<artifactId>guava</artifactId>
-					<version>${guava.version}</version>
-				</dependency>
-
-				<dependency>
-					<groupId>org.apache.derby</groupId>
-					<artifactId>derby</artifactId>
-					<version>${derby.version}</version>
-				</dependency>
-
-
-				<dependency>
-					<groupId>org.datanucleus</groupId>
-					<artifactId>datanucleus-core</artifactId>
-					<version>${datanucleus.version}</version>
-				</dependency>
-
-				<dependency>
-					<groupId>org.datanucleus</groupId>
-					<artifactId>datanucleus-rdbms</artifactId>
-					<version>${datanucleus.jpa.version}</version>
-				</dependency>
-
-				<dependency>
-					<groupId>org.datanucleus</groupId>
-					<artifactId>datanucleus-api-jdo</artifactId>
-					<version>${datanucleus.jpa.version}</version>
-				</dependency>
-
-				<dependency>
-					<groupId>org.datanucleus</groupId>
-					<artifactId>datanucleus-accessplatform-jdo-rdbms</artifactId>
-					<version>${datanucleus.jpa.version}</version>
-					<type>pom</type>
-				</dependency>
-
 				<!-- hadoop -->
-				<dependency>
-					<groupId>org.apache.hadoop</groupId>
-					<artifactId>hadoop-common</artifactId>
-					<version>${hadoop.version}</version>
-				</dependency>
+				<!-- TODO is this version change required? Version 2.2.0 is provided 
+					by hadoop-client dependency. Shouldn't we have the same versions for the 
+					related dependencies? -->
 				<dependency>
 					<groupId>org.apache.hadoop</groupId>
 					<artifactId>hadoop-mapreduce-client-app</artifactId>
 					<version>2.3.0</version>
 				</dependency>
-				<!-- hive -->
-				<dependency>
-					<groupId>org.apache.hive</groupId>
-					<artifactId>hive-common</artifactId>
-					<version>${hive.version}</version>
-				</dependency>
-				<dependency>
-					<groupId>org.apache.hive</groupId>
-					<artifactId>hive-serde</artifactId>
-					<version>${hive.version}</version>
-				</dependency>
-
-				<dependency>
-					<groupId>org.apache.hive</groupId>
-					<artifactId>hive-jdbc</artifactId>
-					<version>${hive.version}</version>
-				</dependency>
-				<dependency>
-					<groupId>org.apache.hive</groupId>
-					<artifactId>hive-contrib</artifactId>
-					<version>${hive.version}</version>
-				</dependency>
-
-				<dependency>
-					<groupId>com.jolbox</groupId>
-					<artifactId>bonecp</artifactId>
-					<version>${bonecp.version}</version>
-				</dependency>
-
-				<!-- logging -->
-				<dependency>
-					<groupId>org.slf4j</groupId>
-					<artifactId>slf4j-api</artifactId>
-					<version>${slf4j.version}</version>
-				</dependency>
-
-				<!-- SL4J Binding provided at runtime -->
-				<dependency>
-					<groupId>log4j</groupId>
-					<artifactId>log4j</artifactId>
-					<version>1.2.12</version>
-					<scope>provided</scope>
-				</dependency>
-				<dependency>
-					<groupId>org.slf4j</groupId>
-					<artifactId>slf4j-log4j12</artifactId>
-					<version>${slf4j.version}</version>
-					<scope>provided</scope>
-				</dependency>
-
-				<!-- Unit test artifacts -->
-				<dependency>
-					<groupId>junit</groupId>
-					<artifactId>junit</artifactId>
-					<version>4.11</version>
-					<scope>test</scope>
-				</dependency>
-				<dependency>
-					<groupId>org.hamcrest</groupId>
-					<artifactId>hamcrest-all</artifactId>
-					<version>1.3</version>
-					<scope>test</scope>
-				</dependency>
-				<dependency>
-					<groupId>org.apache.mrunit</groupId>
-					<artifactId>mrunit</artifactId>
-					<version>1.0.0</version>
-					<classifier>hadoop2</classifier>
-				</dependency>
-
 			</dependencies>
 		</profile>
 		<profile>
 			<id>crunch</id>
-			<activation>
-				<activeByDefault>false</activeByDefault>
-			</activation>
-			<properties>
-				<skip.unit.tests>true</skip.unit.tests>
-			</properties>
 			<build>
 				<plugins>
 					<plugin>
 						<groupId>org.apache.maven.plugins</groupId>
 						<artifactId>maven-surefire-plugin</artifactId>
+						<version>${plugin.surefire.version}</version>
 						<configuration>
 							<excludes>
 								<exclude>**/*TestPig.java</exclude>
@@ -607,7 +483,7 @@
 							</excludes>
 						</configuration>
 						<executions>
-							<!-- States that both integration-test and verify goals of the Failsafe
+							<!-- States that both integration-test and verify goals of the Failsafe 
 								Maven plugin are executed. -->
 							<execution>
 								<id>integration-tests</id>
@@ -620,26 +496,11 @@
 					</plugin>
 				</plugins>
 			</build>
-
-			<dependencies>
-				<dependency>
-					<groupId>org.apache.crunch</groupId>
-					<artifactId>crunch-core</artifactId>
-					<version>0.9.0-hadoop2</version>
-				</dependency>
-				<dependency>
-					<groupId>com.google.protobuf</groupId>
-					<artifactId>protobuf-java</artifactId>
-					<version>2.5.0</version>
-				</dependency>
-			</dependencies>
 		</profile>
-
 		<profile>
 			<id>mahout</id>
-			<activation>
-				<activeByDefault>false</activeByDefault>
-			</activation>
+			<!-- TODO this property is not being used anywhere. It's not even automatically 
+				detectable. Remove? Or do something that the name suggests? -->
 			<properties>
 				<skip.unit.tests>true</skip.unit.tests>
 			</properties>
@@ -648,6 +509,7 @@
 					<plugin>
 						<groupId>org.apache.maven.plugins</groupId>
 						<artifactId>maven-surefire-plugin</artifactId>
+						<version>${plugin.surefire.version}</version>
 						<configuration>
 							<excludes>
 								<exclude>**/*TestPig.java</exclude>
@@ -688,7 +550,7 @@
 							</excludes>
 						</configuration>
 						<executions>
-							<!-- States that both integration-test and verify goals of the Failsafe
+							<!-- States that both integration-test and verify goals of the Failsafe 
 								Maven plugin are executed. -->
 							<execution>
 								<id>integration-tests</id>
@@ -701,97 +563,6 @@
 					</plugin>
 				</plugins>
 			</build>
-
-			<dependencies>
-
-				<dependency>
-				    <groupId>commons-logging</groupId>
-				    <artifactId>commons-logging</artifactId>
-				    <version>1.1.3</version>
-				</dependency>
-
-			        <!--
-				     For testing on my machine,
-				     I created a bigpetstore mahout jar which
-				     is compiled for 2.2.0  .  Or substitute this with
-				     the standard apache mahout-core but not sure if it
-				     will work.
-			        -->	
-				<dependency>
-					<groupId>bigpetstore</groupId>
-					<artifactId>mahout-core</artifactId>
-					<version>1.0-SNAPSHOT</version>
-					<exclusions>
-					</exclusions>
-				</dependency>
-
-				<dependency>
-				    <groupId>org.apache.mahout</groupId>
-				    <artifactId>mahout-math</artifactId>
-				    <version>0.9</version>
-				</dependency>
-
-
-				<dependency>
-					<groupId>org.slf4j</groupId>
-					<artifactId>slf4j-api</artifactId>
-					<version>LATEST</version>
-
-				</dependency>
-
-				<dependency>
-					<groupId>org.apache.commons</groupId>
-					<artifactId>commons-lang3</artifactId>
-					<version>LATEST</version>
-				</dependency>
-
-				<dependency>
-					<groupId>com.thoughtworks.xstream</groupId>
-					<artifactId>xstream</artifactId>
-					<version>LATEST</version>
-
-				</dependency>
-
-				<dependency>
-					<groupId>org.apache.lucene</groupId>
-					<artifactId>lucene-core</artifactId>
-					<version>LATEST</version>
-
-				</dependency>
-
-				<dependency>
-					<groupId>org.apache.lucene</groupId>
-					<artifactId>lucene-analyzers-common</artifactId>
-					<version>LATEST</version>
-
-				</dependency>
-
-				<dependency>
-					<groupId>org.apache.mahout.commons</groupId>
-					<artifactId>commons-cli</artifactId>
-					<version>LATEST</version>
-
-				</dependency>
-
-				<dependency>
-					<groupId>org.apache.commons</groupId>
-					<artifactId>commons-math3</artifactId>
-					<version>LATEST</version>
-				</dependency>
-
-
-				<dependency>
-					<groupId>org.apache.solr</groupId>
-					<artifactId>solr-commons-csv</artifactId>
-					<version>3.5.0</version>
-				</dependency>
-
-			</dependencies>
-
-
-
 		</profile>
-
 	</profiles>
-
 </project>

http://git-wip-us.apache.org/repos/asf/bigtop/blob/1a851e4f/bigtop-bigpetstore/settings.gradle
----------------------------------------------------------------------
diff --git a/bigtop-bigpetstore/settings.gradle b/bigtop-bigpetstore/settings.gradle
new file mode 100644
index 0000000..85ba25d
--- /dev/null
+++ b/bigtop-bigpetstore/settings.gradle
@@ -0,0 +1 @@
+rootProject.name = 'BigPetStore'

http://git-wip-us.apache.org/repos/asf/bigtop/blob/1a851e4f/bigtop-bigpetstore/setuphive.sh
----------------------------------------------------------------------
diff --git a/bigtop-bigpetstore/setuphive.sh b/bigtop-bigpetstore/setuphive.sh
deleted file mode 100755
index 8dff6dd..0000000
--- a/bigtop-bigpetstore/setuphive.sh
+++ /dev/null
@@ -1,22 +0,0 @@
-### THIS SCRIPT SETS UP HIVE AND HADOOP TARBALLS FOR YOU ###
-HIVE_TARBALL="http://archive.apache.org/dist/hive/hive-0.12.0/hive-0.12.0.tar.gz"
-HADOOP_TARBALL="https://archive.apache.org/dist/hadoop/core/hadoop-1.2.1/hadoop-1.2.1.tar.gz"
-wget $HIVE_TARBALL
-wget $HADOOP_TARBALL
-
-
-# REMEBER SO WE CAN CD BACK AT END 
-mydir=`pwd`
-
-## HADOOP SETUP
-mkdir -p /opt/bigpetstore
-cd /opt/bigpetstore
-tar -xvf hadoop-1.2.1.tar.gz
-export HADOOP_HOME=`pwd`/hadoop-1.2.1
-
-## HIVE SETUP 
-tar -xvf hive-0.12.0.tar.gz
-cp /opt/hive-0.12.0/lib/hive*.jar $HADOOP_HOME/lib
-
-## CD BACK TO ORIGINAL DIR
-cd $mydir

http://git-wip-us.apache.org/repos/asf/bigtop/blob/1a851e4f/bigtop-bigpetstore/src/integration/java/org/apache/bigtop/bigpetstore/BigPetStoreHiveIT.java
----------------------------------------------------------------------
diff --git a/bigtop-bigpetstore/src/integration/java/org/apache/bigtop/bigpetstore/BigPetStoreHiveIT.java b/bigtop-bigpetstore/src/integration/java/org/apache/bigtop/bigpetstore/BigPetStoreHiveIT.java
deleted file mode 100644
index c3646a4..0000000
--- a/bigtop-bigpetstore/src/integration/java/org/apache/bigtop/bigpetstore/BigPetStoreHiveIT.java
+++ /dev/null
@@ -1,108 +0,0 @@
-/**
-* Licensed to the Apache Software Foundation (ASF) under one or more
-* contributor license agreements.  See the NOTICE file distributed with
-* this work for additional information regarding copyright ownership.
-* The ASF licenses this file to You under the Apache License, Version 2.0
-* (the "License"); you may not use this file except in compliance with
-* the License.  You may obtain a copy of the License at
-* http://www.apache.org/licenses/LICENSE-2.0
-* 
-* Unless required by applicable law or agreed to in writing, software
-* distributed under the License is distributed on an "AS IS" BASIS,
-* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-* See the License for the specific language governing permissions and
-* limitations under the License.
-*/
-package org.apache.bigtop.bigpetstore;
-
-
-import java.io.BufferedReader;
-import java.io.File;
-import java.io.InputStreamReader;
-import java.nio.charset.Charset;
-import java.util.List;
-import java.util.Map;
-
-import org.apache.bigtop.bigpetstore.ITUtils;
-import org.apache.bigtop.bigpetstore.etl.HiveViewCreator;
-import org.apache.bigtop.bigpetstore.etl.PigCSVCleaner;
-import org.apache.bigtop.bigpetstore.generator.BPSGenerator;
-import org.apache.bigtop.bigpetstore.util.BigPetStoreConstants;
-import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.fs.FileStatus;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.mapreduce.Job;
-import org.apache.pig.ExecType;
-import org.json.JSONException;
-import org.json.JSONObject;
-
-import com.google.common.base.Function;
-import com.google.common.io.Files;
-import org.junit.After;
-import org.junit.Assert;
-import org.junit.Before;
-import org.junit.Test;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-/**
- * Run this after running the @link{BigPetStorePigIT} test.
- * Duh...
- */
-public class BigPetStoreHiveIT extends ITUtils{
-    final static Logger log = LoggerFactory.getLogger(BigPetStoreHiveIT.class);
-
-    @Before
-    public void setupTest() throws Throwable {
-        super.setup();
-        try {
-            FileSystem.get(new Configuration()).delete(BPS_TEST_MAHOUT_IN);
-        } catch (Exception e) {
-            System.out.println("didnt need to delete hive output.");
-            // not necessarily an error
-        }
-    }
-
-    @Test
-    public void testPetStorePipeline() throws Exception {
-        new HiveViewCreator().run(
-                new String[]{
-                        BPS_TEST_PIG_CLEANED.toString(),
-                        BPS_TEST_MAHOUT_IN.toString()});
-
-        assertOutput(BPS_TEST_MAHOUT_IN, new Function<String, Boolean>() {
-            public Boolean apply(String x) {
-                System.out.println("Verifying "+x);
-                String[] cols = x.split(",");
-                Long.parseLong(cols[0].trim());
-                Long.parseLong(cols[1].trim());
-                Long.parseLong(cols[2].trim());
-                return true;
-            }
-        });
-    }
-
-    public static void assertOutput(Path base,
-            Function<String, Boolean> validator) throws Exception {
-        FileSystem fs = FileSystem.getLocal(new Configuration());
-
-        FileStatus[] files = fs.listStatus(base);
-        // print out all the files.
-        for (FileStatus stat : files) {
-            System.out.println(stat.getPath() + "  " + stat.getLen());
-        }
-
-        Path p = new Path(base, "000000_0");
-        BufferedReader r = new BufferedReader(new InputStreamReader(fs.open(p)));
-
-        // line:{"product":"big chew toy","count":3}
-        while (r.ready()) {
-            String line = r.readLine();
-            log.info("line:" + line);
-            System.out.println("line:" + line);
-            Assert.assertTrue("validationg line : " + line,
-                    validator.apply(line));
-        }
-    }
-}
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/1a851e4f/bigtop-bigpetstore/src/integration/java/org/apache/bigtop/bigpetstore/BigPetStoreMahoutIT.java
----------------------------------------------------------------------
diff --git a/bigtop-bigpetstore/src/integration/java/org/apache/bigtop/bigpetstore/BigPetStoreMahoutIT.java b/bigtop-bigpetstore/src/integration/java/org/apache/bigtop/bigpetstore/BigPetStoreMahoutIT.java
deleted file mode 100644
index 5e6f69c..0000000
--- a/bigtop-bigpetstore/src/integration/java/org/apache/bigtop/bigpetstore/BigPetStoreMahoutIT.java
+++ /dev/null
@@ -1,88 +0,0 @@
-/**
-* Licensed to the Apache Software Foundation (ASF) under one or more
-* contributor license agreements.  See the NOTICE file distributed with
-* this work for additional information regarding copyright ownership.
-* The ASF licenses this file to You under the Apache License, Version 2.0
-* (the "License"); you may not use this file except in compliance with
-* the License.  You may obtain a copy of the License at
-* http://www.apache.org/licenses/LICENSE-2.0
-* 
-* Unless required by applicable law or agreed to in writing, software
-* distributed under the License is distributed on an "AS IS" BASIS,
-* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-* See the License for the specific language governing permissions and
-* limitations under the License.
-*/
-package org.apache.bigtop.bigpetstore;
-
-import java.io.BufferedReader;
-import java.io.InputStreamReader;
-
-import org.apache.bigtop.bigpetstore.clustering.BPSRecommnder;
-import org.apache.bigtop.bigpetstore.etl.HiveViewCreator;
-import org.apache.bigtop.bigpetstore.util.BigPetStoreConstants;
-import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.fs.FileStatus;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-import org.junit.Assert;
-import org.junit.Before;
-import org.junit.Test;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import com.google.common.base.Function;
-
-public class BigPetStoreMahoutIT extends ITUtils{
-
-    final static Logger log = LoggerFactory.getLogger(BigPetStoreHiveIT.class);
-
-    @Before
-    public void setupTest() throws Throwable {
-        super.setup();
-        try {
-            FileSystem.get(new Configuration()).delete(super.BPS_TEST_MAHOUT_OUT);
-        }
-        catch (Exception e) {
-            System.out.println("didnt need to delete mahout output.");
-        }
-    }
-
-    @Test
-    public void testPetStorePipeline() throws Exception {
-        new BPSRecommnder().run(
-                new String[]{
-                        BPS_TEST_MAHOUT_IN.toString(),
-                        BPS_TEST_MAHOUT_OUT.toString()});
-
-        assertOutput(BPS_TEST_MAHOUT_OUT, new Function<String, Boolean>() {
-            public Boolean apply(String x) {
-                System.out.println("Verifying "+x);
-                return true;
-            }
-        });
-    }
-
-    public static void assertOutput(Path base,
-            Function<String, Boolean> validator) throws Exception {
-        FileSystem fs = FileSystem.getLocal(new Configuration());
-
-        FileStatus[] files = fs.listStatus(base);
-        // print out all the files.
-        for (FileStatus stat : files) {
-            System.out.println(stat.getPath() + "  " + stat.getLen());
-        }
-
-        Path p = new Path(base, "part-r-00000");
-        BufferedReader r = new BufferedReader(new InputStreamReader(fs.open(p)));
-
-        // line:{"product":"big chew toy","count":3}
-        while (r.ready()) {
-            String line = r.readLine();
-            log.info("line:" + line);
-            System.out.println("line:" + line);
-            Assert.assertTrue("validationg line : " + line,
-                    validator.apply(line));
-        }
-    }
-}
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/1a851e4f/bigtop-bigpetstore/src/integration/java/org/apache/bigtop/bigpetstore/BigPetStorePigIT.java
----------------------------------------------------------------------
diff --git a/bigtop-bigpetstore/src/integration/java/org/apache/bigtop/bigpetstore/BigPetStorePigIT.java b/bigtop-bigpetstore/src/integration/java/org/apache/bigtop/bigpetstore/BigPetStorePigIT.java
deleted file mode 100644
index db766de..0000000
--- a/bigtop-bigpetstore/src/integration/java/org/apache/bigtop/bigpetstore/BigPetStorePigIT.java
+++ /dev/null
@@ -1,165 +0,0 @@
-/**
-* Licensed to the Apache Software Foundation (ASF) under one or more
-* contributor license agreements.  See the NOTICE file distributed with
-* this work for additional information regarding copyright ownership.
-* The ASF licenses this file to You under the Apache License, Version 2.0
-* (the "License"); you may not use this file except in compliance with
-* the License.  You may obtain a copy of the License at
-* http://www.apache.org/licenses/LICENSE-2.0
-* 
-* Unless required by applicable law or agreed to in writing, software
-* distributed under the License is distributed on an "AS IS" BASIS,
-* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-* See the License for the specific language governing permissions and
-* limitations under the License.
-*/
-package org.apache.bigtop.bigpetstore;
-
-import java.io.BufferedReader;
-import java.io.File;
-import java.io.InputStreamReader;
-import java.util.Map;
-import java.util.Map.Entry;
-
-import junit.framework.Assert;
-
-import org.apache.bigtop.bigpetstore.etl.PigCSVCleaner;
-import org.apache.bigtop.bigpetstore.util.BigPetStoreConstants;
-import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.fs.FileStatus;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-import org.apache.pig.ExecType;
-import org.junit.Before;
-import org.junit.Test;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import com.google.common.base.Function;
-import com.google.common.collect.ImmutableMap;
-import com.google.common.io.Files;
-
-/**
-*  This is the main integration test for pig.
-*  Like all BPS integration tests, it is designed 
-*  to simulate exactly what will happen on the 
-*  actual cluster, except with a small amount of records.
-*
-*  In addition to cleaning the dataset, it also runs the BPS_analytics.pig
-*  script which BigPetStore ships with. 
-*/
-public class BigPetStorePigIT extends ITUtils{
-
-    final static Logger log = LoggerFactory.getLogger(BigPetStorePigIT.class);
-
-    /**
-     * An extra unsupported code path that we have so
-     * people can do ad hoc analytics on pig data after it is
-     * cleaned.
-     */
-    public static final Path BPS_TEST_PIG_COUNT_PRODUCTS = fs.makeQualified(
-            new Path("bps_integration_",
-                    BigPetStoreConstants.OUTPUTS.pig_ad_hoc_script.name()+"0"));
-
-    static final File PIG_SCRIPT = new File("BPS_analytics.pig");
-
-    static {
-        if(PIG_SCRIPT.exists()) {
-
-        }
-        else
-            throw new RuntimeException("Couldnt find pig script at " + PIG_SCRIPT.getAbsolutePath());
-    }
-
-    @Before
-    public void setupTest() throws Throwable {
-        super.setup();
-        try{
-            FileSystem.get(new Configuration()).delete(BPS_TEST_PIG_CLEANED);
-            FileSystem.get(new Configuration()).delete(BPS_TEST_PIG_COUNT_PRODUCTS);
-        }
-        catch(Exception e){
-            System.out.println("didnt need to delete pig output.");
-            //not necessarily an error
-        }
-    }
-
-    static Map<Path,Function<String,Boolean>> TESTS = ImmutableMap.of(
-            /**
-            * Test of the main output
-            */
-            BPS_TEST_PIG_CLEANED,
-            new Function<String, Boolean>(){
-                public Boolean apply(String x){
-                    //System.out.println("Verified...");
-                    return true;
-                }
-            },
-            //Example of how to count products
-            //after doing basic pig data cleanup
-            BPS_TEST_PIG_COUNT_PRODUCTS,
-            new Function<String, Boolean>(){
-                //Jeff'
-                public Boolean apply(String x){
-                    return true;
-                }
-            });
-
-    /**
-     * The "core" task reformats data to TSV.  lets test that first.
-     */
-    @Test
-    public void testPetStoreCorePipeline()  throws Exception {
-        runPig(
-               BPS_TEST_GENERATED,
-               BPS_TEST_PIG_CLEANED,
-               PIG_SCRIPT);
-        for(Entry<Path,Function<String,Boolean>> e : TESTS.entrySet()) {
-            assertOutput(e.getKey(),e.getValue());
-        }
-    }
-
-    public static void assertOutput(Path base,Function<String, Boolean> validator) throws Exception{
-        FileSystem fs = FileSystem.getLocal(new Configuration());
-
-        FileStatus[] files=fs.listStatus(base);
-        //print out all the files.
-        for(FileStatus stat : files){
-            System.out.println(stat.getPath() +"  " + stat.getLen());
-        }
-
-        /**
-         * Support map OR reduce outputs
-         */
-        Path partm = new Path(base,"part-m-00000");
-        Path partr = new Path(base,"part-r-00000");
-        Path p = fs.exists(partm)?partm:partr;
-
-        /**
-         * Now we read through the file and validate
-         * its contents.
-         */
-        BufferedReader r =
-                new BufferedReader(
-                        new InputStreamReader(fs.open(p)));
-
-        //line:{"product":"big chew toy","count":3}
-        while(r.ready()){
-            String line = r.readLine();
-            log.info("line:"+line);
-            //System.out.println("line:"+line);
-            Assert.assertTrue("validationg line : " + line, validator.apply(line));
-        }
-    }
-
-    Map pigResult;
-
-    private void runPig(Path input, Path output, File pigscript) throws Exception {
-
-                new PigCSVCleaner(
-                        input,
-                        output,
-                        ExecType.LOCAL,
-                        pigscript);
-    }
-}
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/1a851e4f/bigtop-bigpetstore/src/integration/java/org/apache/bigtop/bigpetstore/ITUtils.java
----------------------------------------------------------------------
diff --git a/bigtop-bigpetstore/src/integration/java/org/apache/bigtop/bigpetstore/ITUtils.java b/bigtop-bigpetstore/src/integration/java/org/apache/bigtop/bigpetstore/ITUtils.java
deleted file mode 100644
index e93d9ce..0000000
--- a/bigtop-bigpetstore/src/integration/java/org/apache/bigtop/bigpetstore/ITUtils.java
+++ /dev/null
@@ -1,145 +0,0 @@
-/**
-* Licensed to the Apache Software Foundation (ASF) under one or more
-* contributor license agreements.  See the NOTICE file distributed with
-* this work for additional information regarding copyright ownership.
-* The ASF licenses this file to You under the Apache License, Version 2.0
-* (the "License"); you may not use this file except in compliance with
-* the License.  You may obtain a copy of the License at
-* http://www.apache.org/licenses/LICENSE-2.0
-* 
-* Unless required by applicable law or agreed to in writing, software
-* distributed under the License is distributed on an "AS IS" BASIS,
-* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-* See the License for the specific language governing permissions and
-* limitations under the License.
-*/
-package org.apache.bigtop.bigpetstore;
-
-import java.net.InetAddress;
-import java.nio.charset.Charset;
-import java.util.List;
-import java.util.Map;
-
-import org.apache.bigtop.bigpetstore.generator.BPSGenerator;
-import org.apache.bigtop.bigpetstore.util.BigPetStoreConstants;
-import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.mapreduce.Job;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import com.google.common.io.Files;
-
-public class ITUtils {
-
-    static final Logger log = LoggerFactory.getLogger(ITUtils.class);
-
-    static FileSystem fs;
-    static{
-        try{
-            fs=FileSystem.getLocal(new Configuration());
-        }
-        catch(Throwable e)
-        {
-           String cpath = (String) System.getProperties().get("java.class.path");
-           String msg="";
-           for(String cp : cpath.split(":")) {
-               if(cp.contains("hadoop")) {
-                   msg+=cp.replaceAll("hadoop", "**HADOOP**")+"\n";
-               }
-           }
-           throw new RuntimeException("Major error:  Probably issue.   " +
-            		"Check hadoop version?  "+ e.getMessage() +" .... check these classpath elements:"
-                    +msg);
-        }
-    }
-    public static final Path BPS_TEST_GENERATED = fs.makeQualified(
-            new Path("bps_integration_",BigPetStoreConstants.OUTPUTS.generated.name())) ;
-
-    public static final Path BPS_TEST_PIG_CLEANED = fs.makeQualified(
-            new Path("bps_integration_",BigPetStoreConstants.OUTPUTS.cleaned.name()));
-
-    public static final Path BPS_TEST_MAHOUT_IN = fs.makeQualified(
-            new Path("bps_integration_",BigPetStoreConstants.OUTPUTS.MAHOUT_CF_IN.name()));
-
-    public static final Path BPS_TEST_MAHOUT_OUT = fs.makeQualified(
-            new Path("bps_integration_",BigPetStoreConstants.OUTPUTS.MAHOUT_CF_OUT.name()));
-
-    public static void main(String[] args){
-
-    }
-    //public static final Path CRUNCH_OUT = new Path("bps_integration_",BigPetStoreConstants.OUTPUT_3).makeQualified(fs);
-
-    /**
-     * Some simple checks to make sure that unit tests in local FS.
-     * these arent designed to be run against a distribtued system.
-     */
-    public static void checkConf(Configuration conf) throws Exception {
-        if(conf.get("mapreduce.jobtracker.address")==null) {
-            log.warn("Missing mapreduce.jobtracker.address???????!!!! " +
-            		"This can be the case in hive tests which use special " +
-            		"configurations, but we should fix it sometime.");
-            return;
-        }
-        if(! conf.get("mapreduce.jobtracker.address").equals("local")) {
-            throw new RuntimeException("ERROR: bad conf : " + "mapreduce.jobtracker.address");
-        }
-        if(! conf.get("fs.AbstractFileSystem.file.impl").contains("Local")) {
-            throw new RuntimeException("ERROR: bad conf : " + "mapreduce.jobtracker.address");
-        }
-        try {
-            InetAddress addr = java.net.InetAddress.getLocalHost();
-            System.out.println("Localhost = hn=" + addr.getHostName() +" / ha="+addr.getHostAddress());
-        }
-        catch (Throwable e) {
-            throw new RuntimeException(
-            " ERROR : Hadoop wont work at all  on this machine yet"+
-            "...I can't get / resolve localhost ! Check java version/ " +
-            "/etc/hosts / DNS or other networking related issues on your box" +
-            e.getMessage());
-        }
-    }
-
-
-    /**
-     * Creates a generated input data set in
-     *
-     * test_data_directory/generated.
-     * i.e.
-     *  test_data_directory/generated/part-r-00000
-     */
-    public static void setup() throws Throwable{
-        int records = 10;
-        /**
-         * Setup configuration with prop.
-         */
-        Configuration conf = new Configuration();
-
-        //debugging for jeff and others in local fs
-        //that wont build
-        checkConf(conf);
-
-        conf.setInt(BPSGenerator.props.bigpetstore_records.name(), records);
-
-        /**
-         * Only create if doesnt exist already.....
-         */
-        if(FileSystem.getLocal(conf).exists(BPS_TEST_GENERATED)){
-            return;
-        }
-
-        /**
-         * Create the data set.
-         */
-        Job createInput= BPSGenerator.createJob(BPS_TEST_GENERATED, conf);
-        createInput.waitForCompletion(true);
-
-        Path outputfile = new Path(BPS_TEST_GENERATED,"part-r-00000");
-        List<String> lines = Files.readLines(FileSystem.getLocal(conf).pathToFile(outputfile), Charset.defaultCharset());
-        log.info("output : " + FileSystem.getLocal(conf).pathToFile(outputfile));
-        for(String l : lines){
-            System.out.println(l);
-        }
-    }
-}
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/1a851e4f/bigtop-bigpetstore/src/integrationTest/java/org/apache/bigtop/bigpetstore/BigPetStorePigIT.java
----------------------------------------------------------------------
diff --git a/bigtop-bigpetstore/src/integrationTest/java/org/apache/bigtop/bigpetstore/BigPetStorePigIT.java b/bigtop-bigpetstore/src/integrationTest/java/org/apache/bigtop/bigpetstore/BigPetStorePigIT.java
new file mode 100644
index 0000000..045a9cf
--- /dev/null
+++ b/bigtop-bigpetstore/src/integrationTest/java/org/apache/bigtop/bigpetstore/BigPetStorePigIT.java
@@ -0,0 +1,148 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.bigtop.bigpetstore;
+
+import static org.apache.bigtop.bigpetstore.ITUtils.BPS_TEST_GENERATED;
+import static org.apache.bigtop.bigpetstore.ITUtils.BPS_TEST_PIG_CLEANED;
+import static org.apache.bigtop.bigpetstore.ITUtils.fs;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.InputStreamReader;
+import java.util.Map;
+import java.util.Map.Entry;
+
+import org.apache.bigtop.bigpetstore.etl.PigCSVCleaner;
+import org.apache.bigtop.bigpetstore.util.BigPetStoreConstants;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.pig.ExecType;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Function;
+import com.google.common.collect.ImmutableMap;
+
+/**
+ * This is the main integration test for pig. Like all BPS integration tests, it
+ * is designed to simulate exactly what will happen on the actual cluster,
+ * except with a small amount of records.
+ *
+ * In addition to cleaning the dataset, it also runs the BPS_analytics.pig
+ * script which BigPetStore ships with.
+ */
+public class BigPetStorePigIT {
+
+	final static Logger log = LoggerFactory.getLogger(BigPetStorePigIT.class);
+
+	/**
+	 * An extra unsupported code path that we have so people can do ad hoc
+	 * analytics on pig data after it is cleaned.
+	 */
+	public static final Path BPS_TEST_PIG_COUNT_PRODUCTS = fs
+			.makeQualified(new Path("bps_integration_",
+					BigPetStoreConstants.OUTPUTS.pig_ad_hoc_script.name() + "0"));
+
+	static final File PIG_SCRIPT = new File("BPS_analytics.pig");
+
+	static {
+		if (!PIG_SCRIPT.exists()) {
+			throw new RuntimeException("Couldnt find pig script at " + PIG_SCRIPT.getAbsolutePath());
+		}
+	}
+
+	@Before
+	public void setupTest() throws Throwable {
+		ITUtils.setup();
+		try {
+			FileSystem.get(new Configuration()).delete(BPS_TEST_PIG_CLEANED, true);
+			FileSystem.get(new Configuration()).delete(BPS_TEST_PIG_COUNT_PRODUCTS, true);
+		} catch (Exception e) {
+			System.out.println("didnt need to delete pig output.");
+			// not necessarily an error
+		}
+	}
+
+	static Map<Path, Function<String, Boolean>> TESTS = ImmutableMap.of(
+		/** Test of the main output */
+		BPS_TEST_PIG_CLEANED, new Function<String, Boolean>() {
+			public Boolean apply(String x) {
+				// System.out.println("Verified...");
+				return true;
+			}
+		},
+		// Example of how to count products
+		// after doing basic pig data cleanup
+		BPS_TEST_PIG_COUNT_PRODUCTS, new Function<String, Boolean>() {
+			// Jeff'
+			public Boolean apply(String x) {
+				return true;
+			}
+		}
+	);
+
+	/**
+	 * The "core" task reformats data to TSV. lets test that first.
+	 */
+	@Test
+	public void testPetStoreCorePipeline() throws Exception {
+		runPig(BPS_TEST_GENERATED, BPS_TEST_PIG_CLEANED, PIG_SCRIPT);
+		for (Entry<Path, Function<String, Boolean>> e : TESTS.entrySet()) {
+			assertOutput(e.getKey(), e.getValue());
+		}
+	}
+
+	public static void assertOutput(Path base,
+			Function<String, Boolean> validator) throws Exception {
+		FileSystem fs = FileSystem.getLocal(new Configuration());
+
+		FileStatus[] files = fs.listStatus(base);
+		// print out all the files.
+		for (FileStatus stat : files) {
+			System.out.println(stat.getPath() + "  " + stat.getLen());
+		}
+
+		/**
+		 * Support map OR reduce outputs
+		 */
+		Path partm = new Path(base, "part-m-00000");
+		Path partr = new Path(base, "part-r-00000");
+		Path p = fs.exists(partm) ? partm : partr;
+
+		/**
+		 * Now we read through the file and validate its contents.
+		 */
+		BufferedReader r = new BufferedReader(new InputStreamReader(fs.open(p)));
+
+		// line:{"product":"big chew toy","count":3}
+		while (r.ready()) {
+			String line = r.readLine();
+			log.info("line:" + line);
+			// System.out.println("line:"+line);
+			Assert.assertTrue("validationg line : " + line, validator.apply(line));
+		}
+	}
+
+	private void runPig(Path input, Path output, File pigscript)
+			throws Exception {
+		new PigCSVCleaner(input, output, ExecType.LOCAL, pigscript);
+	}
+}

http://git-wip-us.apache.org/repos/asf/bigtop/blob/1a851e4f/bigtop-bigpetstore/src/integrationTest/java/org/apache/bigtop/bigpetstore/ITUtils.java
----------------------------------------------------------------------
diff --git a/bigtop-bigpetstore/src/integrationTest/java/org/apache/bigtop/bigpetstore/ITUtils.java b/bigtop-bigpetstore/src/integrationTest/java/org/apache/bigtop/bigpetstore/ITUtils.java
new file mode 100644
index 0000000..df3b948
--- /dev/null
+++ b/bigtop-bigpetstore/src/integrationTest/java/org/apache/bigtop/bigpetstore/ITUtils.java
@@ -0,0 +1,134 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.bigtop.bigpetstore;
+
+import java.net.InetAddress;
+import java.nio.charset.Charset;
+import java.util.List;
+
+import org.apache.bigtop.bigpetstore.generator.BPSGenerator;
+import org.apache.bigtop.bigpetstore.util.BigPetStoreConstants;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapreduce.Job;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.io.Files;
+
+public class ITUtils {
+
+	static final Logger log = LoggerFactory.getLogger(ITUtils.class);
+
+	static FileSystem fs;
+	static {
+		try {
+			fs = FileSystem.getLocal(new Configuration());
+		} catch (Throwable e) {
+			String cpath = (String) System.getProperties().get("java.class.path");
+			String msg = "";
+			for (String cp : cpath.split(":")) {
+				if (cp.contains("hadoop")) {
+					msg += cp.replaceAll("hadoop", "**HADOOP**") + "\n";
+				}
+			}
+			throw new RuntimeException("Major error:  Probably issue.   " + "Check hadoop version?  " + e.getMessage()
+					+ " .... check these classpath elements:" + msg);
+		}
+	}
+	public static final Path BPS_TEST_GENERATED = fs.makeQualified(new Path("bps_integration_",
+			BigPetStoreConstants.OUTPUTS.generated.name()));
+	public static final Path BPS_TEST_PIG_CLEANED = fs.makeQualified(new Path("bps_integration_",
+			BigPetStoreConstants.OUTPUTS.cleaned.name()));
+	public static final Path BPS_TEST_MAHOUT_IN = fs.makeQualified(new Path("bps_integration_",
+			BigPetStoreConstants.OUTPUTS.MAHOUT_CF_IN.name()));
+	public static final Path BPS_TEST_MAHOUT_OUT = fs.makeQualified(new Path("bps_integration_",
+			BigPetStoreConstants.OUTPUTS.MAHOUT_CF_OUT.name()));
+
+	public static void main(String[] args) {
+
+	}
+
+	// public static final Path CRUNCH_OUT = new
+	// Path("bps_integration_",BigPetStoreConstants.OUTPUT_3).makeQualified(fs);
+
+	/**
+	 * Some simple checks to make sure that unit tests in local FS. these arent
+	 * designed to be run against a distribtued system.
+	 */
+	public static void checkConf(Configuration conf) throws Exception {
+		if (conf.get("mapreduce.jobtracker.address") == null) {
+			log.warn("Missing mapreduce.jobtracker.address???????!!!! " + "This can be the case in hive tests which use special "
+					+ "configurations, but we should fix it sometime.");
+			return;
+		}
+		if (!conf.get("mapreduce.jobtracker.address").equals("local")) {
+			throw new RuntimeException("ERROR: bad conf : " + "mapreduce.jobtracker.address");
+		}
+		if (!conf.get("fs.AbstractFileSystem.file.impl").contains("Local")) {
+			throw new RuntimeException("ERROR: bad conf : " + "mapreduce.jobtracker.address");
+		}
+		try {
+			InetAddress addr = java.net.InetAddress.getLocalHost();
+			System.out.println("Localhost = hn=" + addr.getHostName() + " / ha=" + addr.getHostAddress());
+		} catch (Throwable e) {
+			throw new RuntimeException(" ERROR : Hadoop wont work at all  on this machine yet"
+					+ "...I can't get / resolve localhost ! Check java version/ " + "/etc/hosts / DNS or other networking related issues on your box"
+					+ e.getMessage());
+		}
+	}
+
+	/**
+	 * Creates a generated input data set in
+	 *
+	 * test_data_directory/generated. i.e.
+	 * test_data_directory/generated/part-r-00000
+	 */
+	public static void setup() throws Throwable {
+		int records = 10;
+		/**
+		 * Setup configuration with prop.
+		 */
+		Configuration conf = new Configuration();
+
+		// debugging for jeff and others in local fs
+		// that wont build
+		checkConf(conf);
+
+		conf.setInt(BPSGenerator.props.bigpetstore_records.name(), records);
+
+		/**
+		 * Only create if doesnt exist already.....
+		 */
+		if (FileSystem.getLocal(conf).exists(BPS_TEST_GENERATED)) {
+			return;
+		}
+
+		/**
+		 * Create the data set.
+		 */
+		Job createInput = BPSGenerator.createJob(BPS_TEST_GENERATED, conf);
+		createInput.waitForCompletion(true);
+
+		Path outputfile = new Path(BPS_TEST_GENERATED, "part-r-00000");
+		List<String> lines = Files.readLines(FileSystem.getLocal(conf).pathToFile(outputfile), Charset.defaultCharset());
+		log.info("output : " + FileSystem.getLocal(conf).pathToFile(outputfile));
+		for (String l : lines) {
+			System.out.println(l);
+		}
+	}
+}

http://git-wip-us.apache.org/repos/asf/bigtop/blob/1a851e4f/bigtop-bigpetstore/src/main/java/org/apache/bigtop/bigpetstore/clustering/BPSRecommnder.java
----------------------------------------------------------------------
diff --git a/bigtop-bigpetstore/src/main/java/org/apache/bigtop/bigpetstore/clustering/BPSRecommnder.java b/bigtop-bigpetstore/src/main/java/org/apache/bigtop/bigpetstore/clustering/BPSRecommnder.java
deleted file mode 100644
index 748578a..0000000
--- a/bigtop-bigpetstore/src/main/java/org/apache/bigtop/bigpetstore/clustering/BPSRecommnder.java
+++ /dev/null
@@ -1,83 +0,0 @@
-/**
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- * http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.bigtop.bigpetstore.clustering;
-
-import org.apache.bigtop.bigpetstore.util.DeveloperTools;
-import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.util.Tool;
-import org.apache.hadoop.util.ToolRunner;
-import org.apache.mahout.cf.taste.hadoop.item.RecommenderJob;
-import org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob;
-import org.apache.pig.builtin.LOG;
-
-/**
- * Implement user based collab filter.
- *
- * The input set is the
- *
- * userid,productid,weight
- *
- * rows.
- */
-public class BPSRecommnder implements Tool {
-
-
-    Configuration c;
-    @Override
-    public void setConf(Configuration conf) {
-        c=conf;
-    }
-
-    @Override
-    public Configuration getConf() {
-        return c;
-    }
-
-    @Override
-    public int run(String[] args) throws Exception {
-        DeveloperTools.validate(args,"input path","output path");
-
-        Configuration conf = new Configuration();
-
-        System.out.println("Runnning recommender against : " + args[0] +" -> " + args[1]);
-
-        RecommenderJob recommenderJob = new RecommenderJob();
-        /**
-        int x = ToolRunner.run(getConf(), new BPSPreparePreferenceMatrixJob(), new String[]{
-            "--input", args[0],
-            "--output", args[1],
-            "--tempDir", "/tmp",
-          });
-        System.out.println("RETURN = " + x);
-         **/
-
-        int ret = recommenderJob.run(new String[] {
-             "--input",args[0],
-             "--output",args[1],
-             "--usersFile","/tmp/users.txt",
-             "--tempDir", "/tmp/mahout_"+System.currentTimeMillis(),
-             "--similarityClassname", "SIMILARITY_PEARSON_CORRELATION",
-             "--threshold",".00000000001",
-             "--numRecommendations", "4",
-             //"--encodeLongsAsInts",
-             //Boolean.FALSE.toString(),
-             //"--itemBased", Boolean.FALSE.toString()
-        });
-
-        System.out.println("Exit of recommender: " + ret);
-        return ret;
-    }
-}
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/1a851e4f/bigtop-bigpetstore/src/main/java/org/apache/bigtop/bigpetstore/etl/HiveViewCreator.java
----------------------------------------------------------------------
diff --git a/bigtop-bigpetstore/src/main/java/org/apache/bigtop/bigpetstore/etl/HiveViewCreator.java b/bigtop-bigpetstore/src/main/java/org/apache/bigtop/bigpetstore/etl/HiveViewCreator.java
deleted file mode 100755
index 4fabb6f..0000000
--- a/bigtop-bigpetstore/src/main/java/org/apache/bigtop/bigpetstore/etl/HiveViewCreator.java
+++ /dev/null
@@ -1,157 +0,0 @@
-/**
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- * http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.bigtop.bigpetstore.etl;
-
-import java.sql.Connection;
-import java.sql.DriverManager;
-import java.sql.ResultSet;
-import java.sql.SQLException;
-import java.sql.Statement;
-
-import org.apache.bigtop.bigpetstore.util.BigPetStoreConstants;
-import org.apache.bigtop.bigpetstore.util.NumericalIdUtils;
-import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.booleanValue_return;
-import org.apache.hadoop.io.Text;
-import org.apache.hadoop.util.Tool;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-/**
- *
- * Hive View creator is designed to read from Pigs cleaned output.
- * The basic strategy is:
- *
- * 1) store pig output as a hive table
- * 2) use "select .. as" to select a subset
- *
- * Note on running locally:
- *
- * 1) Local mode requires a hive and hadoop tarball, with HIVE_HOME and
- * HADOOP_HOME pointing to it. 2) In HADOOP_HOME, you will need to cp the
- * HIVE_HOME/lib/hive-serde*jar file into HADOOP_HOME/lib.
- *
- * Then, the below queries will run.
- *
- * The reason for this is that the hive SerDe stuff is used in the MapReduce
- * phase of things, so those utils need to be available to hadoop itself. That
- * is because the regex input/output is processed vthe mappers
- *
- */
-public class HiveViewCreator implements Tool {
-
-    static {
-        try{
-            Class.forName("org.apache.hadoop.hive.ql.exec.mr.ExecDriver");
-            System.out.println("found exec driver !!!!!!!!!!!!!!!!");
-        }
-        catch(Throwable t) {
-            throw new RuntimeException(t);
-        }
-        try{
-            //Class.forName("org.apache.hadoop.hive.ql.exec.mr.ExecDriver");
-        }
-        catch(Throwable t) {
-            throw new RuntimeException(t);
-        }
-    }
-    Configuration conf;
-    @Override
-    public void setConf(Configuration conf) {
-        this.conf=conf;
-    }
-
-    @Override
-    public Configuration getConf() {
-        return conf;
-    }
-
-    /**
-     * Input args:
-     *  Cleaned data files from pig (tsv)
-     *  Ouptut table (desired path to mahout input data set)
-     *
-     */
-    @Override
-    public int run(String[] args) throws Exception {
-        Statement stmt = getConnection();
-        stmt.execute("DROP TABLE IF EXISTS " + BigPetStoreConstants.OUTPUTS.MAHOUT_CF_IN.name());
-        System.out.println("input data " + args[0]);
-        System.out.println("output table " + args[1]);
-
-        Path inTablePath =  new Path(args[0]);
-        String inTableName = "cleaned"+System.currentTimeMillis();
-        String outTableName = BigPetStoreConstants.OUTPUTS.MAHOUT_CF_IN.name();
-
-        Path outTablePath = new Path (inTablePath.getParent(),outTableName);
-
-        final String create = "CREATE EXTERNAL TABLE "+inTableName+" ("
-                + "  dump STRING,"
-                + "  state STRING,"
-                + "  trans_id STRING,"
-                + "  lname STRING,"
-                + "  fname STRING,"
-                + "  date STRING,"
-                + "  price STRING,"
-                + "  product STRING"
-                + ") ROW FORMAT "
-                + "DELIMITED FIELDS TERMINATED BY '\t' "
-                + "LINES TERMINATED BY '\n' "
-                + "STORED AS TEXTFILE "
-                + "LOCATION '"+inTablePath+"'";
-        boolean res = stmt.execute(create);
-        System.out.println("Execute return code : " +res);
-        //will change once we add hashes into pig ETL clean
-        String create2 =
-                "create table "+outTableName+" as "+
-                "select hash(concat(state,fname,lname)),',',hash(product),',',1 "
-                + "from "+inTableName;
-
-        System.out.println("CREATE = " + create2  );
-        System.out.println("OUT PATH = " + outTablePath);
-        boolean res2 = stmt.execute(create2);
-
-        String finalOutput = String.format(
-                "INSERT OVERWRITE DIRECTORY '%s' SELECT * FROM %s",outTablePath, outTableName) ;
-
-        stmt.execute(finalOutput);
-        System.out.println("FINAL OUTPUT STORED : " + outTablePath);
-        return 0;
-    }
-
-    public static final String HIVE_JDBC_DRIVER = "org.apache.hive.jdbc.HiveDriver";
-    public static final String HIVE_JDBC_EMBEDDED_CONNECTION = "jdbc:hive2://";
-
-    final static Logger log = LoggerFactory.getLogger(HiveViewCreator.class);
-
-
-    private Statement getConnection() throws ClassNotFoundException,
-            SQLException {
-        Class.forName(HIVE_JDBC_DRIVER);
-        Connection con = DriverManager.getConnection(
-                HIVE_JDBC_EMBEDDED_CONNECTION, "", "");
-        System.out.println("hive con = " + con.getClass().getName());
-        Statement stmt = con.createStatement();
-        return stmt;
-    }
-
-    public static void main(String[] args) throws Exception {
-        new HiveViewCreator()
-            .run(args);
-    }
-}
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/bigtop/blob/1a851e4f/bigtop-bigpetstore/src/main/resources/hive-log4j.properties
----------------------------------------------------------------------
diff --git a/bigtop-bigpetstore/src/main/resources/hive-log4j.properties b/bigtop-bigpetstore/src/main/resources/hive-log4j.properties
deleted file mode 100755
index 9236008..0000000
--- a/bigtop-bigpetstore/src/main/resources/hive-log4j.properties
+++ /dev/null
@@ -1,84 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# Define some default values that can be overridden by system properties
-hive.log.threshold=ERROR
-hive.root.logger=ERROR,DRFA
-hive.log.dir=/tmp/${user.name}
-hive.log.file=hive.log
-
-# Define the root logger to the system property "hadoop.root.logger".
-log4j.rootLogger=${hive.root.logger}, EventCounter, console
-
-# Logging Threshold
-log4j.threshold=${hive.log.threshold}
-
-#
-# Daily Rolling File Appender
-#
-# Use the PidDailyerRollingFileAppend class instead if you want to use separate log files
-# for different CLI session.
-#
-# log4j.appender.DRFA=org.apache.hadoop.hive.ql.log.PidDailyRollingFileAppender
-
-log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
-
-log4j.appender.DRFA.File=${hive.log.dir}/${hive.log.file}
-
-# Rollver at midnight
-log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
-
-# 30-day backup
-#log4j.appender.DRFA.MaxBackupIndex=30
-log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
-
-# Pattern format: Date LogLevel LoggerName LogMessage
-#log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
-# Debugging Pattern format
-log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
-
-
-#
-# console
-# Add "console" to rootlogger above if you want to use this
-#
-
-log4j.appender.console=org.apache.log4j.ConsoleAppender
-log4j.appender.console.target=System.err
-log4j.appender.console.layout=org.apache.log4j.PatternLayout
-log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
-log4j.appender.console.encoding=UTF-8
-
-#custom logging levels
-#log4j.logger.xxx=DEBUG
-
-#
-# Event Counter Appender
-# Sends counts of logging messages at different severity levels to Hadoop Metrics.
-#
-log4j.appender.EventCounter=org.apache.hadoop.metrics.jvm.EventCounter
-
-
-log4j.category.DataNucleus=OFF
-log4j.category.Datastore=OFF
-log4j.category.Datastore.Schema=OFF
-log4j.category.JPOX.Datastore=OFF
-log4j.category.JPOX.Plugin=OFF
-log4j.category.JPOX.MetaData=OFF
-log4j.category.JPOX.Query=OFF
-log4j.category.JPOX.General=OFF
-log4j.category.JPOX.Enhancer=OFF
-

http://git-wip-us.apache.org/repos/asf/bigtop/blob/1a851e4f/bigtop-bigpetstore/src/main/resources/hive-site.xml
----------------------------------------------------------------------
diff --git a/bigtop-bigpetstore/src/main/resources/hive-site.xml b/bigtop-bigpetstore/src/main/resources/hive-site.xml
deleted file mode 100644
index dd96f32..0000000
--- a/bigtop-bigpetstore/src/main/resources/hive-site.xml
+++ /dev/null
@@ -1,36 +0,0 @@
-<?xml version="1.0"?>
-
-<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-
-<configuration>
-
-<!-- Hive Configuration can either be stored in this file or in the hadoop configuration files  -->
-<!-- that are implied by Hadoop setup variables.                                                -->
-<!-- Aside from Hadoop setup variables - this file is provided as a convenience so that Hive    -->
-<!-- users do not have to edit hadoop configuration files (that may be managed as a centralized -->
-<!-- resource).                                                                                 -->
-
-<!-- Hive Execution Parameters -->
-
-<property>
-  <name>javax.jdo.option.ConnectionURL</name>
-  <!-- value>jdbc:derby:;databaseName=/var/lib/hive/metastore/metastore_db;create=true</value -->
-    <value>jdbc:derby:;databaseName=/tmp/metastore/metastore_db;create=true</value>
-  <description>JDBC connect string for a JDBC metastore</description>
-</property>
-
-<property>
-  <name>hive.metastore.warehouse.dir</name>
-  <value>/tmp</value>
-  <description>Driver class name for a JDBC metastore</description>
-</property>
-
-
-<property>
-  <name>javax.jdo.option.ConnectionDriverName</name>
-  <value>org.apache.derby.jdbc.EmbeddedDriver</value>
-  <description>Driver class name for a JDBC metastore</description>
-</property>
-
-
-</configuration>

http://git-wip-us.apache.org/repos/asf/bigtop/blob/1a851e4f/bigtop-bigpetstore/src/test/java/org/apache/bigtop/bigpetstore/docs/TestDocs.java
----------------------------------------------------------------------
diff --git a/bigtop-bigpetstore/src/test/java/org/apache/bigtop/bigpetstore/docs/TestDocs.java b/bigtop-bigpetstore/src/test/java/org/apache/bigtop/bigpetstore/docs/TestDocs.java
index 883bb55..3292ba5 100644
--- a/bigtop-bigpetstore/src/test/java/org/apache/bigtop/bigpetstore/docs/TestDocs.java
+++ b/bigtop-bigpetstore/src/test/java/org/apache/bigtop/bigpetstore/docs/TestDocs.java
@@ -15,32 +15,23 @@
  */
 package org.apache.bigtop.bigpetstore.docs;
 
-import java.io.File;
+import static org.junit.Assert.assertTrue;
 
-import junit.framework.Assert;
+import java.io.File;
 
-import org.apache.bigtop.bigpetstore.util.BigPetStoreConstants;
 import org.apache.bigtop.bigpetstore.util.BigPetStoreConstants.OUTPUTS;
 import org.apache.commons.io.FileUtils;
 import org.junit.Test;
 
 public class TestDocs {
 
-    @Test
-    public void testGraphViz() throws Exception{
-        //test the graphviz file
-        //by grepping out the constants.
-        String graphviz=FileUtils.readFileToString(new File("arch.dot"));
-        System.out.println(graphviz);
-
-        org.junit.Assert.assertTrue(
-                graphviz.contains(
-                        OUTPUTS.generated.name()));
-
-        org.junit.Assert.assertTrue(
-                graphviz.contains(
-                        OUTPUTS.cleaned.name()));
-
+	@Test
+	public void testGraphViz() throws Exception {
+		// test the graphviz file by grepping out the constants.
+		String graphviz = FileUtils.readFileToString(new File("arch.dot"));
+		System.out.println(graphviz);
 
-    }
+		assertTrue(graphviz.contains(OUTPUTS.generated.name()));
+		assertTrue(graphviz.contains(OUTPUTS.cleaned.name()));
+	}
 }
\ No newline at end of file


Mime
View raw message