phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (Jira)" <j...@apache.org>
Subject [jira] [Updated] (PHOENIX-5688) Investigate better client/server work pacing
Date Sun, 19 Jan 2020 22:29:00 GMT

     [ https://issues.apache.org/jira/browse/PHOENIX-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lars Hofhansl updated PHOENIX-5688:
-----------------------------------
    Summary: Investigate better client/server work pacing  (was: Investigate better server
work pacing)

> Investigate better client/server work pacing
> --------------------------------------------
>
>                 Key: PHOENIX-5688
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5688
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Priority: Major
>
> [~kozdemir] shared an intriguing idea that he used for the server side index repair tool,
which would equally well apply to the server side deletes and server side UPSERT/SELECT.
> The main problem with the current implementation is that we basically send a predicate
to the server - DELETE FROM <table> WHERE <condition>. Now the server(s) will
go away per region chunk, evaluate the condition and delete whatever matched it... All in
tight server loop.
> The downside is that (a) a server thread is held up arbitrarily long, (b) there is no
way for the server to do any fair queuing, the loop has to finish, and (c) if the server takes
too long the client will just time out.
> The alternative used to be to do the work on the client instead: Issue a scan with the
condition to the server, retrieve the IDs to the client, and then issue nice chunks of deletes
back to the server.
> The downside here is the extra communication overhead between the server and client (which
might be especially taxing for UPSERT/SELECTS).
> Kadir's approach is a middle ground:
>  # Issue a scan from the client, and send along a chunk size (N rows), when getting the
scanner.
>  # The server will do N rows worth of work, then return.
>  # The client keeps the scanner open, and calls next.
>  # Goto #2
> This way we get the benefit of both approaches: (1) work close to where the data is,
(2) the client can pace the work and the server gets a chance to schedule other work.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message