hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Ellis (MELBOURNE-AU)" <joh...@microsoft.com>
Subject Re: Examples of HBase and Indices?
Date Wed, 13 May 2009 05:24:01 GMT
Hi -

You could start by looking at the org.apache.hadoop.hbase.client.tableindexed.IndexedTableAdmin
, IndexSpecification and
org.apache.hadoop.hbase.HTableDescriptor.addIndex(IndexSpecification index)

I don't think you can use the shell currently to create index specifications.

Grab the source and have a look at TestIndexedTable for an example of how to create the index.

Cheers, John


On 13/05/09 6:52 AM, "Jason Buberel" <jason@buberel.org> wrote:

Yesterday I completed a basic investigative setup that involved
installing/deploying:

Hadoop v0.19.1
HBase v0.19.1

Both were deployed in a pseudo-cluster configuration (a cluster with one
node). Using the HBase shell, I created a simple table to hold real esate
data: address, city, state, zip, beds, baths, sqft, etc:

create 'listing_entry', \
>     {NAME => 'mls_d'}, \
>     {NAME => 'address_1'}, \
>     {NAME => 'address_2'}, \
>     {NAME => 'city'}, \
>     {NAME => 'state'}, \
>     {NAME => 'zip'}, \
>     {NAME => 'date'}, \
>     {NAME => 'beds'}, \
>     {NAME => 'baths'},\
>     {NAME => 'sqft'}, \
>     {NAME => 'lot'}, \
>     {NAME => 'year_built'}
>

I then wrote a short program to generated ~ 10M sample rows, with a randomly
chosen zip value between 10000 and 99999 plus a city name randomly selected
from a list of 10 values ('SUNNYVALE', 'CUPERTINO', 'MOUNTAIN VIEW', 'PALO
ALTO', etc.).

Next, I put together a simple query application that would search the 10M
rows, looking for entries that matched by city, zip or both. The code for
the zip code search was simple:

HBaseConfiguration config = new HBaseConfiguration();
> HTable table = new HTable(config, "listing_entry");
>
> RowFilterInterface filter = new ColumnValueFilter(Bytes.toBytes("zip:"),
>     ColumnValueFilter.CompareOp.EQUAL,
>     Bytes.toBytes("94086"));
> Scanner search = table.getScanner(new String[]{"address_1:","city:",
> "zip:"}, "",Long.MAX_VALUE, filter);
> for (RowResult result : search) {
>   System.out.println("   " +  result.get(Bytes.toBytes("address_1:")) + "/"
> +
>     result.get(Bytes.toBytes("city:")) + "/" +
> result.get(Bytes.toBytes("zip:")));
> }
> search.close();
>

When this was executed against a sample database of 10K rows, the query took
about 15 seconds. So far, so good. But when that was expanded to the full
sample data set of 10M rows, the request timed out.

>From there, I went searching for information on how to create and then make
use of indixes, which led me to the JavaDoc for IndexedTable. After reading
through the JavaDocs on that class, it looked as though it is intended to be
used to read data from indexed tables. Poking around the HBase shell help
info, I didn't see any information specific to creating indices or indexed
tables. I also looked through the examples code, but didn't find any
information on indexing.

Is there some other bit of example code or documentation I can read through
that would help me figure out how to make my table with 10M rows queryable
with reasonable response times?  Or am I going about this all wrong, trying
to wedge my structured-query-like brain into an orthogonal solution space?

Thanks for any pointers...

jason


--
Jason L. Buberel
jason@buberel.org


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message