hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthick Sankarachary (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3851) A Random-Access Column Object Model
Date Mon, 09 May 2011 16:57:03 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030807#comment-13030807

Karthick Sankarachary commented on HBASE-3851:

Basically, the goal here is to reduce the number of round trips between the client and region
servers. By way of example, let's say we've a table of user profiles, where the profile includes
the users interests (a set of things they like) and portfolio (a map of stock symbol to price
paid). If we put their interests (or portfolio) in a single column, then every time we want
to add/remove an interest (stock), we'll most likely need to read that column prior to updating
it. On the other hand, if we break down the interests (portfolio) into multiple columns, one
for each element in the set (map), then that will allow us to add/remove elements without
reading the entire collection first.

Having said that, I took a look at the object-mappings proposed for some of the other NoSQL
databases, and they all happen to live outside of the project proper. In light of that, I'll
do as you suggested, and put this on github. If you'd like to revisit this down the road,
please feel free to re-open this.

> A Random-Access Column Object Model
> -----------------------------------
>                 Key: HBASE-3851
>                 URL: https://issues.apache.org/jira/browse/HBASE-3851
>             Project: HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.92.0
>            Reporter: Karthick Sankarachary
>            Assignee: Karthick Sankarachary
>            Priority: Minor
>              Labels: HBase, Mapping, Object
>             Fix For: 0.92.0
>         Attachments: HBASE-3851.patch
> By design, a value in HBase is an opaque and atomic byte array. In theory, any arbitrary
type can potentially be represented in terms of such unstructured yet indivisible units. However,
as the complexity of the type increases, so does the need to access it in parts rather than
in whole. That way, one can update parts of a value without reading the whole first. This
calls for transparency in the type of data being accessed.
> To that end, we introduce here a simple object model where each part maps to a {{HTable}}
column and value thereof. Specifically, we define a {{ColumnObject}} interface that denotes
an arbitrary type comprising properties, where each property is a {{<name, value>}}
tuple of byte arrays. In essence, each property maps to a distinct HBase {{KeyValue}}. In
particular, the property's name maps to a column, prefixed by the qualifier and the object's
identifier (assumed to be unique within a column family), and the property's value maps to
the {{KeyValue#getValue()}} of the corresponding column. Furthermore, the {{ColumnObject}}
is marked as a {{RandomAccess}} type to underscore the fact that its properties can be accessed
in and of themselves.
> For starters, we provide three concrete objects - a {{ColumnMap}}, {{ColumnList}} and
{{ColumnSet}} that implement the {{Map}}, {{List}} and {{Set}} interfaces respectively. The
{{ColumnMap}} treats each {{Map.Entry}} as an object property, the {{ColumnList}} stores each
element against its ordinal position, and the {{ColumnSet}} considers each element as the
property name (as well as its value). For the sake of convenience, we also define extensions
to the {{Get}}, {{Put}}, {{Delete}} and {{Result}} classes that are aware of and know how
to deal with such {{ColumnObject}} types.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message