Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 69FD0CF93 for ; Sat, 26 May 2012 15:50:18 +0000 (UTC) Received: (qmail 46304 invoked by uid 500); 26 May 2012 15:50:18 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 46251 invoked by uid 500); 26 May 2012 15:50:18 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 46112 invoked by uid 99); 26 May 2012 15:50:18 -0000 Received: from tyr.zones.apache.org (HELO tyr.zones.apache.org) (140.211.11.114) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 26 May 2012 15:50:18 +0000 Received: by tyr.zones.apache.org (Postfix, from userid 65534) id D93A4182BC; Sat, 26 May 2012 15:50:17 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: brandonwilliams@apache.org To: commits@cassandra.apache.org X-Mailer: ASF-Git Admin Mailer Subject: [2/3] git commit: Update pig readme Message-Id: <20120526155017.D93A4182BC@tyr.zones.apache.org> Date: Sat, 26 May 2012 15:50:17 +0000 (UTC) Update pig readme Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/2dc27a17 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/2dc27a17 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/2dc27a17 Branch: refs/heads/trunk Commit: 2dc27a17567fa448aae335e74cc46ab94339eba4 Parents: db68e03 Author: Brandon Williams Authored: Sat May 26 10:50:00 2012 -0500 Committer: Brandon Williams Committed: Sat May 26 10:50:00 2012 -0500 ---------------------------------------------------------------------- examples/pig/README.txt | 19 +++++++++++++++++-- 1 files changed, 17 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/cassandra/blob/2dc27a17/examples/pig/README.txt ---------------------------------------------------------------------- diff --git a/examples/pig/README.txt b/examples/pig/README.txt index 3bdbf10..57b8f57 100644 --- a/examples/pig/README.txt +++ b/examples/pig/README.txt @@ -1,7 +1,8 @@ A Pig storage class that reads all columns from a given ColumnFamily, or writes properly formatted results into a ColumnFamily. -Setup: +Getting Started +=============== First build and start a Cassandra server with the default configuration and set the PIG_HOME and JAVA_HOME environment @@ -31,7 +32,6 @@ for input and output: * PIG_OUTPUT_RPC_PORT : the port thrift is listening on for writing * PIG_OUTPUT_PARTITIONER : cluster partitioner for writing - Then you can run it like this: examples/pig$ bin/pig_cassandra -x local example-script.pig @@ -70,3 +70,18 @@ Which will copy the ColumnFamily. Note that the destination ColumnFamily must already exist for this to work. See the example in test/ to see how schema is inferred. + +Advanced Options +================ + +The following environment variables default to false but can be set to true to enable them: + +PIG_WIDEROW_INPUT: this enables loading of rows with many columns without + incurring memory pressure. All columns will be in a bag and indexes are not + supported. + +PIG_USE_SECONDARY: this allows easy use of secondary indexes within your + script, by appending every index to the schema as 'index_$name', allowing + filtering of loaded rows with a statement like "FILTER rows BY index_color eq + 'blue'" if you have an index called 'color' defined. +