hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-521) Improve client scanner interface
Date Mon, 24 Mar 2008 21:53:24 GMT

    [ https://issues.apache.org/jira/browse/HBASE-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581690#action_12581690

stack commented on HBASE-521:

I suppose RowResults could easily be so big, they'd blow out memory on client or server. 
Whats our defense?  That designing your request/MR, that you are not select too much?  (I
suppose we've always had this prob.  This patch does not introduce it)

This issue addresses one of the items raised in our plan for 0.2.

Should we bite the bullet and change the name of the methods in HTable to be getScanner instead
of 'obtainScanner -- just deprecate the old ones... In fact, we probably should do this since
we're breaking the methods anyways (add deprecate to old obtainScanner methods).

hmmm.... but next is completely different.  Maybe we should just say that HTable has changed
completely in 0.2, and TableMap, etc.

I like changing name from HScannerInterface to Scanner.  Change HInternalScannerInterface
to InternalScanner?

The change in 'Index: src/java/org/apache/hadoop/hbase/util/Migrate.java' is odd; you just
add imports?  Is that right?

For IdentityTableReduce, the interface should be <Long, BatchUpdate>, rather than <Text,
BatchUpdate>?  The Long would be an index of some kind.  That seems to be the model for
identity mappers... same for the TOF/TIF.  Should key be a Long rather than duplicate of info
in RowResult/BatchUpdate?

> Improve client scanner interface
> --------------------------------
>                 Key: HBASE-521
>                 URL: https://issues.apache.org/jira/browse/HBASE-521
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: client
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Minor
>             Fix For: 0.2.0
>         Attachments: 521.patch
> The current client scanner interface is pretty ugly. You need to instantiate an HStoreKey
and SortedMap<Text, byte[]> externally and then pass them into next. This is pretty
bad, because for starters, the client has to choose the implementation of the map when they
create it, so it's extra brain cycles to figure that out. HStoreKey doesn't show up anywhere
else in the entire client side API, but here it bubbles out of next as a way to get the row
and presumably the timestamp of the columns.
> I propose that we supplant HScannerInterface with Scanner, an easier-to-use version for
clients. Its next method would look something like:
> {code}
> public RowResult next() throws IOException;
> {code}
> This packs the data up much more cleanly, including using Cells as values instead of
raw byte[], meaning you have much more granular timestamp information. You also don't need
HStoreKey anymore.
> By breaking Scanner away from HScannerInterface, we can leave the internal scanning code
completely alone (keep using HStoreKeys and such) but make the client cleaner.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message