hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/Cascading" by Misty
Date Mon, 02 Nov 2015 04:57:43 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hbase/Cascading" page has been changed by Misty:
https://wiki.apache.org/hadoop/Hbase/Cascading?action=diff&rev1=4&rev2=5

- [[http://www.cascading.org/|Cascading]] is an alternative API to Hadoop MapReduce. Under
the covers it uses MapReduce during execution, but during development, users don't have to
think in MapReduce to create solutions for execution on Hadoop.
+ The HBase Wiki is in the process of being decommissioned. The info that used to be on this
page has moved to https://hbase.apache.org/book.html#cascading. Please update your bookmarks.
  
- Cascading now has support for reading and writing data to and from a HBase cluster.
- 
- Detailed information and access to the source code can be found on the [[http://www.cascading.org/modules.html|Cascading
Modules]] page.  [[http://code.google.com/p/cascading/downloads/list|Cascading 1.0.1]] is
required.
- 
- Here is a simple example showing how to "sink" data into an HBase cluster. Note the exact
same "hBaseTap" instance can be used to "source" data as well (as shown in the unit tests).
See the github repo, linked from the modules page, for more up-to-date API.
- 
- {{{#!java
- // read data from the default filesystem
- // emits two fields: "offset" and "line"
- Tap source = new Hfs( new TextLine(), inputFileLhs );
- 
- // store data in a HBase cluster
- // accepts fields "num", "lower", and "upper"
- // will automatically scope incoming fields to their proper familyname, "left" or "right"
- Fields keyFields = new Fields( "num" );
- String[] familyNames = {"left", "right"};
- Fields[] valueFields = new Fields[] {new Fields( "lower" ), new Fields( "upper" ) };
- Tap hBaseTap = new HBaseTap( "multitable", new HBaseScheme( keyFields, familyNames, valueFields
), SinkMode.REPLACE );
- 
- // a simple pipe assembly to parse the input into fields
- // a real app would likely chain multiple Pipes together for more complex processing
- Pipe parsePipe = new Each( "insert", new Fields( "line" ), new RegexSplitter( new Fields(
"num", "lower", "upper" ), " " ) );
- 
- // "plan" a cluster executable Flow
- // this connects the source Tap and hBaseTap (the sink Tap) to the parsePipe
- Flow parseFlow = new FlowConnector( properties ).connect( source, hBaseTap, parsePipe );
- 
- // start the flow, and block until complete
- parseFlow.complete();
- 
- // open an iterator on the HBase table we stuffed data into
- TupleEntryIterator iterator = parseFlow.openSink();
- 
- while(iterator.hasNext())
-   {
-   // print out each tuple from HBase
-   System.out.println( "iterator.next() = " + iterator.next() );
-   }
- 
- iterator.close();
- }}}
- 
- Note the "hBaseTap" above can be used as both a sink and a source in a Flow. So another
Flow could be created to process data stored in HBase.
- 

Mime
View raw message