hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7826) Improve Hbase Thrift v1 to return results in sorted order
Date Wed, 17 Apr 2013 00:55:15 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633620#comment-13633620
] 

Jean-Daniel Cryans commented on HBASE-7826:
-------------------------------------------

I had a follow-up discussion with [~shiven] and his team, and the issue they have is that
they need to stream through fat rows with millions of columns so they cannot sort client-side.
Their own testing with the original patch shows much better performance if the Thrift server
returns the data already sorted. Furthermore, it's silly that we don't support sorted columns
everywhere but in Thrift. Right now we're stuck with was written years ago.

[~shiven] suggested that we try to find a way to add this functionality while keeping thing
compatible. In the worst case this could be done by adding a whole different set of methods
that return a different RowResult object that contains a list.

But here's my proposal that should not involve a whole lot of duplicated methods:

- Have {{TScan}} carry a new optional {{boolean}} to specify if the user wants sorted columns
back.
- Have {{RowResult}} carry a new optional list of {{TCells}} that will contain the sorted
KVs.
- Change {{RowResult}}'s {{columns}} map to be also optional.
- Add a new wrapper class in {{ThriftServerRunner}} that will contain both a {{ResultScanner}}
and the boolean passed in {{TScan}} and put this in {{scannerMap}} instead of the {{ResultScanner}}.
- Change {{ThriftServerRunner.scannerGetList}} methods to check the boolean from the wrapper
class to see if it should populate {{RowResult}}'s list or map.

The end result is that current client thrift code won't need to be recompiled and will get
their map, and new clients that talk to a new server will be able to pass a boolean when creating
a scanner that will request results to be returned in a list.

There's also the question of if we want to change {{getRows}} methods to have a new optional
{{boolean}}.
                
> Improve Hbase Thrift v1 to return results in sorted order
> ---------------------------------------------------------
>
>                 Key: HBASE-7826
>                 URL: https://issues.apache.org/jira/browse/HBASE-7826
>             Project: HBase
>          Issue Type: New Feature
>          Components: Thrift
>    Affects Versions: 0.94.0
>            Reporter: Shivendra Pratap Singh
>            Assignee: Shivendra Pratap Singh
>            Priority: Minor
>              Labels: Hbase, Thrift
>         Attachments: hbase_7826.patch, hbase_7826.patch
>
>
> Hbase natively stores columns sorted based on the column qualifier. A scan is guaranteed
to return sorted columns. The Java API works fine but the Thrift API is broken. Hbase uses
TreeMap that ensures that sort order is maintained. However Hbase thrift specification uses
a simple Map to store the data. A map, since it is unordered doesn't result in columns being
returned in a sort order that is consistent with their storage in Hbase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message