phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PHOENIX-4932) Brainstorm more ways to avoid special SPLIT handling in Phoenix
Date Fri, 28 Sep 2018 02:20:00 GMT

     [ https://issues.apache.org/jira/browse/PHOENIX-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lars Hofhansl updated PHOENIX-4932:
-----------------------------------
    Description: 
Currently Phoenix still requires special handling and retries (automated and manually by the
client user) when SPLITs occur in HBase.

PHOENIX-4849 avoids that for "simple" SELECTs. I think we can go further if we add a bit more
logic to the client like this:
 * Sorts. As we merge sort partial server results from the server scan, start a "merge bucket"
when we see the next K/V to be out of order (that can happen when HBase executes partial scan
across the new daughter regions)
 * Aggregates. Make sure the client can deal with more than one result per scan. I.e. for
a SUM the scanner might return two results if HBase splits the scan across two regions. Similarly
for AVG, client needs to deal with two sets of SUM/COUNT.
 * Offset. Make sure the client applies the offset. The server might return more. (this might
be more complicated... haven't look too closely)

In summary: We should let HBase do its things as much as possible. HBase already deals with
SPLITs, scans are restarted and scan across regions, the region cache on the client is invalidated,
etc.

Just parking this here. This is not new. The ideas are probably not new either.

[~tdsilva], FYI.

  was:
Currently Phoenix still requires special handling and retries (automated and manually by the
client user) when SPLITs occur in HBase.

PHOENIX-4849 avoids that for "simple" SELECTs. I think we can go further if we add a bit more
logic to the client like this:
* Sorts. As we merge sort partial server results from the server scan, start a "merge bucket"
when we need the next K/V to be out of order (that can happen when HBase executes partial
scan across the new daughter regions)
* Aggregates. Make sure the client can deal with more than one result per scan. I.e. for a
SUM the scanner might return two results if HBase splits the scan across two regions. Similarly
for AVG, client needs to deal with two sets of SUM/COUNT.
* Offset. Make sure the client applies the offset. The server might return more. (this might
be more complicated... haven't look too closely)

In summary: We should let HBase do its things as much as possible. HBase already deals with
SPLITs, scans are restarted and scan across regions, the region cache on the client is invalidated,
etc.

Just parking this here. This is not new. The ideas are probably not new either.

[~tdsilva], FYI.


> Brainstorm more ways to avoid special SPLIT handling in Phoenix
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-4932
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4932
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Lars Hofhansl
>            Priority: Major
>
> Currently Phoenix still requires special handling and retries (automated and manually
by the client user) when SPLITs occur in HBase.
> PHOENIX-4849 avoids that for "simple" SELECTs. I think we can go further if we add a
bit more logic to the client like this:
>  * Sorts. As we merge sort partial server results from the server scan, start a "merge
bucket" when we see the next K/V to be out of order (that can happen when HBase executes partial
scan across the new daughter regions)
>  * Aggregates. Make sure the client can deal with more than one result per scan. I.e.
for a SUM the scanner might return two results if HBase splits the scan across two regions.
Similarly for AVG, client needs to deal with two sets of SUM/COUNT.
>  * Offset. Make sure the client applies the offset. The server might return more. (this
might be more complicated... haven't look too closely)
> In summary: We should let HBase do its things as much as possible. HBase already deals
with SPLITs, scans are restarted and scan across regions, the region cache on the client is
invalidated, etc.
> Just parking this here. This is not new. The ideas are probably not new either.
> [~tdsilva], FYI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message