Return-Path: X-Original-To: apmail-gora-commits-archive@www.apache.org Delivered-To: apmail-gora-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 74C6C17E07 for ; Mon, 29 Sep 2014 01:08:17 +0000 (UTC) Received: (qmail 86617 invoked by uid 500); 29 Sep 2014 01:08:17 -0000 Delivered-To: apmail-gora-commits-archive@gora.apache.org Received: (qmail 86580 invoked by uid 500); 29 Sep 2014 01:08:17 -0000 Mailing-List: contact commits-help@gora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@gora.apache.org Delivered-To: mailing list commits@gora.apache.org Received: (qmail 86570 invoked by uid 99); 29 Sep 2014 01:08:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Sep 2014 01:08:17 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Sep 2014 01:07:53 +0000 Received: from eris.apache.org (localhost [127.0.0.1]) by eris.apache.org (Postfix) with ESMTP id EE78723888A6; Mon, 29 Sep 2014 01:07:51 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r1628111 - /gora/site/trunk/content/current/index.md Date: Mon, 29 Sep 2014 01:07:51 -0000 To: commits@gora.apache.org From: lewismc@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20140929010751.EE78723888A6@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: lewismc Date: Mon Sep 29 01:07:51 2014 New Revision: 1628111 URL: http://svn.apache.org/r1628111 Log: Update GoraCI documentation Modified: gora/site/trunk/content/current/index.md Modified: gora/site/trunk/content/current/index.md URL: http://svn.apache.org/viewvc/gora/site/trunk/content/current/index.md?rev=1628111&r1=1628110&r2=1628111&view=diff ============================================================================== --- gora/site/trunk/content/current/index.md (original) +++ gora/site/trunk/content/current/index.md Mon Sep 29 01:07:51 2014 @@ -120,34 +120,25 @@ detected. As GoraCI is packaged with the Gora master branch source it is automatically built every time you execute -mvn install + mvn install The maven pom file has some profiles that attempt to make it easier to run GoraCI against different Gora backends by copying the jars you need into lib. Before packaging its important to edit gora.properties and set it correctly for your datastore. To run against Accumulo do the following. - - vim src/main/resources/gora.properties //set Accumulo properties - - mvn package -Paccumulo-1.4 - + vim src/main/resources/gora.properties //set Accumulo properties + mvn package -Paccumulo-1.4 To run against HBase, do the following. - - vim src/main/resources/gora.properties //set HBase properties - - mvn package -Phbase-0.92 - + vim src/main/resources/gora.properties //set HBase properties + mvn package -Phbase-0.92 To run against Cassandra, do the following. - - vim src/main/resources/gora.properties //set Cassandra properties - - mvn package -Pcassandra-1.1.2 - + vim src/main/resources/gora.properties //set Cassandra properties + mvn package -Pcassandra-1.1.2 For other datastores mentioned in gora.properties, you will need to copy the appropriate deps into lib. Feel free to update the pom with other profiles, [open @@ -173,11 +164,8 @@ Below is a description of the Java progr assumes all needed jars are in the lib dir. It does not need the package name. You can just run goraci.sh Generator, below is an example. - - $ ./goraci.sh Generator - - Usage : Generator - + $ ./goraci.sh Generator + Usage : Generator For Gora to work, it needs a gora.properties file on the classpath and a gora-$datastore-mapping.xml mapping file on the classpath, the contents of both are datastore specific, @@ -186,7 +174,6 @@ and build the goraci-${version}-SN those and put them on the classpath through some other means. ####Gora and Hadoop - Gora uses [Apache Avro](http://avro.apache.org) which uses a Json library that Hadoop has an old version of. The two libraries jackson-core and jackson-mapper need to be updated in $HADOOP_HOME/lib and $HADOOP_HOME/share/hadoop/lib/. Currently these are updated to @@ -194,45 +181,36 @@ jackson-core-asl-1.4.2.jar and jackson-m [HADOOP-6945](https://issues.apache.org/jira/browse/HADOOP-6945). ####GoraCI and HBase - To improve performance running read jobs such as the Verify step, enable scanner caching on the command line. For example: - $ ./gorachi.sh Verify -Dhbase.client.scanner.caching=1000 \ - -Dmapred.map.tasks.speculative.execution=false verify_dir 1000 - + -Dmapred.map.tasks.speculative.execution=false verify_dir 1000 Dependent on how you have your Hadoop and HBase setup deployed, you may need to change the gorachi.sh script around some. Here is one suggestion that may help in the case where your Hadoop and HBase configuration are other than under the Hadoop and HBase home directories. - - diff --git a/org.apache.gora.goraci.sh b/org.apache.gora.goraci.sh - index db1562a..31c3c94 100755 - --- a/org.apache.gora.goraci.sh - +++ b/org.apache.gora.goraci.sh - @@ -95,6 +95,4 @@ done - #run it - export HADOOP_CLASSPATH="$CLASSPATH" - LIBJARS=`echo $HADOOP_CLASSPATH | tr : ,` - -hadoop jar "$GORACI_HOME/lib/org.apache.gora.goraci-0.0.1-SNAPSHOT.jar" $CLASS -libjars "$LIBJARS" "$@" - - - - - +CLASSPATH="${HBASE_CONF_DIR}" hadoop --config "${HADOOP_CONF_DIR} jar "$GORACI_HOME/lib/org.apache.gora.goraci-0.0.1-SNAPSHOT.jar" $CLASS -files "${HBASE_CONF_DIR}/hbase-site.xml" -libjars "$LIBJARS" "$@" - + diff --git a/org.apache.gora.goraci.sh b/org.apache.gora.goraci.sh + index db1562a..31c3c94 100755 + --- a/org.apache.gora.goraci.sh + +++ b/org.apache.gora.goraci.sh + @@ -95,6 +95,4 @@ done + #run it + export HADOOP_CLASSPATH="$CLASSPATH" + LIBJARS=`echo $HADOOP_CLASSPATH | tr : ,` + -hadoop jar "$GORACI_HOME/lib/org.apache.gora.goraci-0.0.1-SNAPSHOT.jar" $CLASS -libjars "$LIBJARS" "$@" + - + - + +CLASSPATH="${HBASE_CONF_DIR}" hadoop --config "${HADOOP_CONF_DIR} jar "$GORACI_HOME/lib/org.apache.gora.goraci-0.0.1-SNAPSHOT.jar" $CLASS -files "${HBASE_CONF_DIR}/hbase-site.xml" -libjars "$LIBJARS" "$@" You will need to define HBASE_CONF_DIR and HADOOP_CONF_DIR before you run your **goraci** jobs. For example: - - $ export HADOOP_CONF_DIR=/home/you/hadoop-conf - - $ export HBASE_CONF_DIR=/home/you/hbase-conf - - $ PATH=/home/you/hadoop-1.0.2/bin:$PATH ./goraci.sh Generator 1000 1000000 - + $ export HADOOP_CONF_DIR=/home/you/hadoop-conf + $ export HBASE_CONF_DIR=/home/you/hbase-conf + $ PATH=/home/you/hadoop-1.0.2/bin:$PATH ./goraci.sh Generator 1000 1000000 ####Concurrency @@ -262,47 +240,31 @@ Below shows running a test of the test. in it, ensure the verifaction map reduce job notices that the node is missing. Not all output is shown, just the important parts. - - $ ./org.apache.gora.goraci.sh Generator 1 25000000 - - $ ./org.apache.gora.goraci.sh Print -s 2000000000000000 -l 1 - - 2000001f65dbd238:30350f9ae6f6e8f7:000004265852:ef09f9dd-75b1-4c16-9f14-0fa84f3029b6 - - $ ./org.apache.gora.goraci.sh Print -s 30350f9ae6f6e8f7 -l 1 - - 30350f9ae6f6e8f7:4867fe03de6ea6c8:000003265852:ef09f9dd-75b1-4c16-9f14-0fa84f3029b6 - - $ ./org.apache.gora.goraci.sh Delete 30350f9ae6f6e8f7 - - Delete returned true - - $ ./org.apache.gora.goraci.sh Verify gci_verify_1 2 - - 11/12/20 17:12:31 INFO mapred.JobClient: org.apache.gora.goraci.Verify$Counts - - 11/12/20 17:12:31 INFO mapred.JobClient: UNDEFINED=1 - - 11/12/20 17:12:31 INFO mapred.JobClient: REFERENCED=24999998 - - 11/12/20 17:12:31 INFO mapred.JobClient: UNREFERENCED=1 - - $ hadoop fs -cat gci_verify_1/part\* 30350f9ae6f6e8f7 2000001f65dbd238 - + $ ./goraci.sh Generator 1 25000000 + $ ./goraci.sh Print -s 2000000000000000 -l 1 + 2000001f65dbd238:30350f9ae6f6e8f7:000004265852:ef09f9dd-75b1-4c16-9f14-0fa84f3029b6 + $ ./goraci.sh Print -s 30350f9ae6f6e8f7 -l 1 + 30350f9ae6f6e8f7:4867fe03de6ea6c8:000003265852:ef09f9dd-75b1-4c16-9f14-0fa84f3029b6 + $ ./goraci.sh Delete 30350f9ae6f6e8f7 + Delete returned true + $ ./goraci.sh Verify gci_verify_1 2 + 11/12/20 17:12:31 INFO mapred.JobClient: org.apache.gora.goraci.Verify$Counts + 11/12/20 17:12:31 INFO mapred.JobClient: UNDEFINED=1 + 11/12/20 17:12:31 INFO mapred.JobClient: REFERENCED=24999998 + 11/12/20 17:12:31 INFO mapred.JobClient: UNREFERENCED=1 + $ hadoop fs -cat gci_verify_1/part\* 30350f9ae6f6e8f7 2000001f65dbd238 The map reduce job found the one undefined node and gave the node that referenced it. Below are some timing statistics for running Goraci on a 10 node cluster. - - Store | Task | Time | Undef | Unref | Ref - ----------------+------------------------+---------+--------+-------+------------ - accumulo-1.4.0 | Generator 10 100000000 | 40m 16s | N/A | N/A | N/A - accumulo-1.4.0 | Verify /tmp/goraci1 40 | 6m 7s | 0 | 0 | 1000000000 - hbase-0.92.1 | Generator 10 100000000 | 2h 44m | N/A | N/A | N/A - hbase-0.92.1 | Verify /tmp/goraci2 40 | 6m 34s | 0 | 0 | 1000000000 - + Store | Task | Time | Undef | Unref | Ref + ----------------+------------------------+---------+--------+-------+------------ + accumulo-1.4.0 | Generator 10 100000000 | 40m 16s | N/A | N/A | N/A + accumulo-1.4.0 | Verify /tmp/goraci1 40 | 6m 7s | 0 | 0 | 1000000000 + hbase-0.92.1 | Generator 10 100000000 | 2h 44m | N/A | N/A | N/A + hbase-0.92.1 | Verify /tmp/goraci2 40 | 6m 34s | 0 | 0 | 1000000000 HBase and Accumulo are configured differently out-of-the-box. We used the Accumulo 3G, native configuration examples in the [conf/examples](https://github.com/apache/gora/tree/master/gora-goraci/src/main/resources) directory. @@ -310,9 +272,7 @@ HBase and Accumulo are configured differ To provide a comparable memory footprint, we increased the HBase jvm to "-Xmx4000m", and turned on compression for the ci table: - -create 'ci', {NAME=>'meta', COMPRESSION=>'GZ'} - + create 'ci', {NAME=>'meta', COMPRESSION=>'GZ'} We also turned down the replication of write-ahead logs to be comparable to Accumulo: