hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Kellerman <...@powerset.com>
Subject RE: [hbase] Suggestions on hbase APIs.
Date Mon, 21 Jan 2008 17:42:19 GMT
> -----Original Message-----
> From: Mafish Liu [mailto:mafish@gmail.com]
> Sent: Monday, January 21, 2008 12:23 AM
> To: hadoop-dev@lucene.apache.org
> Subject: [hbase] Suggestions on hbase APIs.
> Hi:
> I'm recently using hbase (included in hadoop 0.15.2
> release)to manage spatial data.
> And found two "flaws" which I think can be improved.
> First, if you fetch the column names in a hbase table using "
>  Set <Text> columns = tableDes.families().keySet(); "
> You can get a set of column names that ended by a colon,
> which I think should be gotten rid of.

The name that ends with a colon is the name of the column family,
and you can create multiple family members in an adhoc fashion.

For example say you have a column named 'meta:' in which you
store data about web pages. You can create multiple family members
in the same row such as 'meta:mime-type', 'meta:crawl-date',
'meta:encoding', etc.


HTable table = new HTable(conf, tableName);
long id = table.startUpdate(row);
// enter data in column meta:
table.put(id, new Text("meta:mime-type"), data);
table.put(id, new Text("meta:crawl-date"), data);
table.put(id, new Text("meta:encoding"), data);
// enter data in column contents:
table.put(id, new Text("contents:"), data);

> Second, if you read all contains in a hbase table by
> "HScannerInterface.next" method, you will ge a TreeMap<Text,
> byte[]> every time you call. Returning column names every
> time is a waste  of memory and network bandwidth.
> And there should be an efficient way to do such work.

Well, you can retrieve multiple columns with a scanner,
so if the column name was not passed back, how would
you determine which column goes with which data. Scanning
the table in the example above:

HScannerInterface scanner = table.obtainScanner(
  new Text[] {new Text("contents:"), new Text("meta")},
  new Text()); // empty start row = start at beginning

now when you do scanner.next you need the map to
find the value for "contents:" and the (multiple)
values for "meta:".

> The above two APIs are used in my program and also in Hbase
> shell program.
> I don't know if there are alternative APIs that have
> performed the improvements.
> Best regards.
> Mafish
> --
> Mafish@gmail.com
> Institute of Computing Technology, Chinese Academy of
> Sciences, Beijing.
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.5.516 / Virus Database: 269.19.8/1235 - Release
> Date: 1/21/2008 9:39 AM

No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.19.8/1235 - Release Date: 1/21/2008 9:39 AM

View raw message