hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/Jython" by TravisBrady
Date Wed, 06 Feb 2008 19:56:28 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by TravisBrady:
http://wiki.apache.org/hadoop/Hbase/Jython

New page:
== Accessing HBase from Jython ==

This page describes the process of connecting to HBase from Jython.  These instructions should
help in connecting from other dynamic languages running on the JVM like Scala, JRuby, etc.
 The code mostly follows the [http://wiki.apache.org/hadoop/Hbase/FAQ#1 Can someone give an
example of basic API-usage going against hbase?] example listed in the HBase FAQ.

== Setting Your Classpath ==

Working with HBase from Jython is pretty simple assuming you've got your CLASSPATH set up.
 The CLASSPATH is an environment variable that is basically a module search path containing
paths to jar files where the code you're going import/use lives.
The HBase team are working on making it easy to set and get your CLASSPATH, but for now the
way to get it is to start HBase:
{{{
bin/start-hbase.sh
}}} 
and then get the classpath like so
{{{
ps ax | grep regionserver
}}}
Which will spit out a bunch of stuff.  Within that blob of text is a -classpath option, which
will likely contain a ton of paths to stuff.
Copy that text and then do
{{{
export CLASSPATH=$THE_CLASSPATH_YOU_COPIED
}}}
My CLASSPATH then contains 24 entries.
When you start Jython it will likely print some stuff to the screen about processing each
of the jars listed in your CLASSPATH.

== The Code ==

Once you've got that set it's as simple as just translating the Java on the FAQ page to legal
Jython.

The code below creates a table, puts some data in it, fetches that data back out and then
deletes the table.

{{{
import java.lang
from org.apache.hadoop.hbase import HBaseConfiguration, HBaseAdmin,\
    HTableDescriptor, HColumnDescriptor, HTable, HConstants
from org.apache.hadoop.io import Text

# First get a conf object.  This will read in the configuration 
# that is out in your hbase-*.xml files such as location of the
# hbase master node.
conf = HBaseConfiguration()

# Create a table named 'test' that has two column families,
# one named 'content, and the other 'anchor'.  The colons
# are required for column family names.
tablename = "test"     # some things accept Strings 
tablename_text = Text(tablename)  # others accept Text


desc = HTableDescriptor(tablename)
desc.addFamily(HColumnDescriptor("content:"))
desc.addFamily(HColumnDescriptor("anchor:"))
admin = HBaseAdmin(conf)

# Drop and recreate if it exists
if admin.tableExists(tablename_text):
    admin.deleteTable(tablename_text)
admin.createTable(desc)

tables = admin.listTables()

table = HTable(conf, tablename_text)

# Add content to 'column:' on a row named 'row_x'
row = Text("row_x")
# Use `lock_id` here to avoid name collision with the Python builtin id function
lock_id = table.startUpdate(row)
table.put(lock_id, Text("content:"), "some content")
table.commit(lock_id)

# Now fetch the content just added, returns a byte[]
data = table.get(row, Text("content:"))

data_str = java.lang.String(data, "UTF8")  # cast to a UTF8 string
print "The fetched row contains the value '%s'" % (data_str)


# Delete the table.
admin.deleteTable(desc.getName())

}}}

Mime
View raw message