phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junegunn Choi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-3073) Fast path for single-key point lookups
Date Thu, 14 Jul 2016 07:27:20 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376491#comment-15376491
] 

Junegunn Choi commented on PHOENIX-3073:
----------------------------------------

I also found that the client overhead for this particular query was greatly increased recently
due to the following change (PHOENIX-3040):

https://github.com/apache/phoenix/commit/f9420e6fb8d635572a7049527db0cc513dbeebe6#diff-8c3d3f644c66ef36d5bc604f017fabfcL144

With the change, the query started to use guideposts (isPointLookup = true, plan.isSerial()
= false) and it turns out to be quite costly on the client side. The client CPU usage for
the same workload before PHOENIX-3040 is 22.3%.

> Fast path for single-key point lookups
> --------------------------------------
>
>                 Key: PHOENIX-3073
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3073
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Junegunn Choi
>            Assignee: Junegunn Choi
>         Attachments: PHOENIX-3073.patch
>
>
> While comparing Phoenix JDBC client to the native HBase Java client, I noticed that Phoenix
client uses significantly more CPU time on the client machine. Profiling revealed that the
majority of the time was spent on {{BaseResultIterators.getParallelScans()}}. This was surprising
to me as I was only testing with simple point lookup queries.
> Here's how I tested:
> - {{SELECT /*+ SMALL SERIAL */ ID, DOCID FROM IMAGE WHERE ID = ?}}
>     - {{IMAGE}} is a salted table with 100 salt buckets
>     - {{ID}}, the primary key, was randomly selected in a small range so that the requests
are served without disk I/O
> - 20K/sec concurrent requests using 128 threads
> {{getParallelScans()}} is quite expensive as it iterates over all regions of the table
which can be many, only to return a single Scan object for this query. Since such a single-key
point lookup is one of the most frequent type of requests in a typical OLTP application, I
believe it makes sense to have a fast path for it. With the patch, the average CPU usage of
the client during the workload dropped to 18.8% from 56.7% before the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message