phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (PHOENIX-3271) Distribute UPSERT SELECT across cluster
Date Fri, 27 Jan 2017 06:34:24 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842317#comment-15842317
] 

James Taylor edited comment on PHOENIX-3271 at 1/27/17 6:33 AM:
----------------------------------------------------------------

I like your long term ideas, [~enis] (JIRA please?), but I think this patch is good in the
near term. The timeouts should be prevented by our RenewLease client side impl (by [~samarthjain]).
If HBase let us renew leases on the server side, that'd be an improvement, but what we have
works.

IMHO, having a safeguard config would lead to code duplication and make maintenance harder.
I think we're ok without it (provided we do adequate testing). This patch should improve global
index build times substantially.




was (Author: jamestaylor):
I like your long term ideas, [~enis] (JIRA please?), but I think this oatch is good in the
near term. The timeouts should be prevented by our RenewLease client side impl (by [~samarthjain]).
If HBase let us renew leases on the server side, that'd be an improvement, but what we have
works.

IMHO, having a safeguard config would lead to code duplication and make maintenance harder.
I think we're ok without it (provided we do adequate testing). This patch should improve global
index build times substantially.



> Distribute UPSERT SELECT across cluster
> ---------------------------------------
>
>                 Key: PHOENIX-3271
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3271
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Ankit Singhal
>             Fix For: 4.10.0
>
>         Attachments: PHOENIX-3271.patch, PHOENIX-3271_v1.patch, PHOENIX-3271_v2.patch,
PHOENIX-3271_v3.patch, PHOENIX-3271_v4.patch, PHOENIX-3271_v5.patch
>
>
> Based on some informal testing we've done, it seems that creation of a local index is
orders of magnitude faster that creation of global indexes (17 seconds versus 10-20 minutes
- though more data is written in the global index case). Under the covers, a global index
is created through the running of an UPSERT SELECT. Also, UPSERT SELECT provides an easy way
of copying a table. In both of these cases, the data being upserted must all flow back to
the same client which can become a bottleneck for a large table. Instead, what can be done
is to push each separate, chunked UPSERT SELECT call out to a different region server for
execution there. One way we could implement this would be to have an endpoint coprocessor
push the chunked UPSERT SELECT out to each region server and return the number of rows that
were upserted back to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message