hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: map/reduce with hbase
Date Wed, 14 Nov 2007 00:40:03 GMT
I tried using streaming to dump into a hbase table.  As things are 
currently written, it unfortunately won't work.  Streaming would seem to 
presume keys and values of type Text whereas the TableOutputFormat takes 
a key of type Text but the value is expected to be MapWritable (where 
the keys are column names).   Even if you could do types other than Text 
in Streaming, an MW type is awkward for php/python, etc., to conpose.

Regards your question as to how php might access hbase, at the moment 
your options are few:

+ There is the Edward Yoon patch that you've already tripped over, 
hadoop-2171.  It puts up an IPC server that fields HQL strings.  The 
server does the HQL parse and forwards the interpreted request to the 
hbase cluster.  Included is a first cut at php code that is capable of 
making the basic method call against the remote java IPC server.
+ If traffic is light and your requests are read-only, there is the HQL 
page in the master's webui.

If hbase had a REST interface, hadoop-2068, would that work for you?


Billy wrote:
> Can you show me an example on how that would be down with the command line?
> "Michael Stack" <stack@duboce.net> wrote in 
> message news:4736244E.1050804@duboce.net...
>> Billy wrote:
>>> ..
>>> What I am looking to do is get and store the input and output from/in 
>>> hbase.
>> I haven't tried it but it looks like you can specify input and output 
>> classes for streaming with -inputformat and -outputformat options.
>> Try setting these to TableInputFormat [1] and TableOutputFormat [2] 
>> respectively.
>> Usual caveats apply: These hbase classes need to be either bundled into 
>> your job jar -- awkward in this case since you are using the streaming job 
>> jar -- or they need to be on the cluster CLASSPATH (Add the 
>> hadoop*hbase.jar to lib directory across the cluster is probably easiest 
>> thing to do).
>> St.Ack
>> 1. 
>> http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/mapred/TableInputFormat.html
>> 2. 
>> http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/org/apache/hadoop/hbase/mapred/TableOutputFormat.html

View raw message