hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/Cascading" by ChrisWensel
Date Tue, 03 Feb 2009 22:54:30 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by ChrisWensel:

New page:
[http://www.cascading.org/ Cascading] is an alternative API to Hadoop MapReduce. Under the
covers it uses MapReduce during execution, but during development, users don't have to think
in MapReduce to create solutions for execution on Hadoop.

Cascading now has support for reading and writing data to and from a HBase cluster.

Detailed information and access to the source code can be found on the [http://www.cascading.org/modules.html
Cascading Modules] page.

A simple example (see the github repo for more up-to-date API):

// read data from the default filesystem
// emits two fields: "offset" and "line"
Tap source = new Hfs( new TextLine(), inputFileLhs );

// store data in a HBase cluster
// accepts fields "num", "lower", and "upper"
// will automatically scope incoming fields to their proper familyname, "left" or "right"
Fields keyFields = new Fields( "num" );
String[] familyNames = {"left", "right"};
Fields[] valueFields = new Fields[] {new Fields( "lower" ), new Fields( "upper" ) };
Tap hBaseTap = new HBaseTap( "multitable", new HBaseScheme( keyFields, familyNames, valueFields
), SinkMode.REPLACE );

// a simple pipe assembly to parse the input into fields
// a real app would likely chain multiple Pipes together for more complex processing
Pipe parsePipe = new Each( "insert", new Fields( "line" ), new RegexSplitter( new Fields(
"num", "lower", "upper" ), " " ) );

// "plan" a cluster executable Flow
// this connects the source Tap and hBaseTap (the sink Tap) to the parsePipe
Flow parseFlow = new FlowConnector( properties ).connect( source, hBaseTap, parsePipe );

// start the flow, and block until complete

// open an iterator on the HBase table we stuffed data into
TupleEntryIterator iterator = parseFlow.openSink();

  // print out each tuple from HBase
  System.out.println( "iterator.next() = " + iterator.next() );


Note the "hBaseTap" above can be used as both a sink and a source in a Flow. So another Flow
could be created to process data stored in HBase.

View raw message