hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Duxbury (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-521) Improve client scanner interface
Date Mon, 24 Mar 2008 22:17:25 GMT

    [ https://issues.apache.org/jira/browse/HBASE-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581697#action_12581697
] 

Bryan Duxbury commented on HBASE-521:
-------------------------------------

>however: remove commented out code in ScannerHandler, TestScannerAPI
Done.

>I suppose RowResults could easily be so big, they'd blow out memory on client or server.
Whats our defense? That designing your request/MR, that you are not select too much? (I suppose
we've always had this prob. This patch does not introduce it)
We have this problem throughout our project. Our RPC framework is, well, an RPC, not a stream,
so we really can't handle alternatives. I think we should deal with oversized requests and
replies when people actually show us cases where it both makes sense and is a problem.

>Should we bite the bullet and change the name of the methods in HTable to be getScanner
instead of 'obtainScanner - just deprecate the old ones... In fact, we probably should do
this since we're breaking the methods anyways (add deprecate to old obtainScanner methods).
Makes sense, probably should do this. It is an issue that we're breaking compatibility - we
could offer a DeprecatedScanner wrapper class that coverts the RowResult back, if we wanted
to. TableMap and friends, I don't see the point in trying to keep them reverse compatible,
because they didn't work quite right in the first place. The changes I have made give you
a lot more options (BatchUpdates as values for TIF), things we actually wanted to fix in 0.2
anyway.

I'll change HInternalScannerInterface to InternalScanner. Much cleaner to read.

Reverted changes to Migrate.java. Changes were relevant until another patch got applied.

> For IdentityTableReduce, the interface should be <Long, BatchUpdate>, rather than
<Text, BatchUpdate>?
Where would we generate the Long from? Why can't the Text rowkey be used as the identity attribute?
Not that it really matters - all the BatchUpdates are just going to be applied individually.
There's no merging or anything in IdentityTableReduce.

> Improve client scanner interface
> --------------------------------
>
>                 Key: HBASE-521
>                 URL: https://issues.apache.org/jira/browse/HBASE-521
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: client
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>            Priority: Minor
>             Fix For: 0.2.0
>
>         Attachments: 521.patch
>
>
> The current client scanner interface is pretty ugly. You need to instantiate an HStoreKey
and SortedMap<Text, byte[]> externally and then pass them into next. This is pretty
bad, because for starters, the client has to choose the implementation of the map when they
create it, so it's extra brain cycles to figure that out. HStoreKey doesn't show up anywhere
else in the entire client side API, but here it bubbles out of next as a way to get the row
and presumably the timestamp of the columns.
> I propose that we supplant HScannerInterface with Scanner, an easier-to-use version for
clients. Its next method would look something like:
> {code}
> public RowResult next() throws IOException;
> {code}
> This packs the data up much more cleanly, including using Cells as values instead of
raw byte[], meaning you have much more granular timestamp information. You also don't need
HStoreKey anymore.
> By breaking Scanner away from HScannerInterface, we can leave the internal scanning code
completely alone (keep using HStoreKeys and such) but make the client cleaner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message