phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samarth Jain (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (PHOENIX-2724) Query with large number of guideposts is slower compared to no stats
Date Wed, 29 Jun 2016 17:04:17 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15355481#comment-15355481
] 

Samarth Jain edited comment on PHOENIX-2724 at 6/29/16 5:03 PM:
----------------------------------------------------------------

So I debugged this by adding a bunch of log lines. And it turns out the culprit here isn't
the size of stats cache. The default cache size of is more than enough to fit in all the guide
posts in memory. The below code in BaseResultIterators is taking up majority of the time when
we have large number of guideposts. 

{code}
try {
                    while (guideIndex < gpsSize && (currentGuidePost.compareTo(endKey)
<= 0 || endKey.length == 0)) {
                        Scan newScan = scanRanges.intersectScan(scan, currentKeyBytes, currentGuidePostBytes,
keyOffset,
                                false);
                        if(newScan != null) {
                            ScanUtil.setLocalIndexAttributes(newScan, keyOffset, regionInfo.getStartKey(),
                                regionInfo.getEndKey(), newScan.getStartRow(), newScan.getStopRow());
                        }
                        if (newScan != null) {
                            estimatedRows += gps.getRowCounts().get(guideIndex);
                            estimatedSize += gps.getByteCounts().get(guideIndex);
                        }
                        scans = addNewScan(parallelScans, scans, newScan, currentGuidePostBytes,
false, regionLocation);
                        currentKeyBytes = currentGuidePost.copyBytes();
                        currentGuidePost = PrefixByteCodec.decode(decoder, input);
                        currentGuidePostBytes = currentGuidePost.copyBytes();
                        guideIndex++;
                    }
                } 
catch (EOFException e) {}
{code}

Another interesting thing to note is the first call to explain plan takes way longer than
subsequent ones. And almost all of that time is spent in the above loop. That majority of
time does *not* include the time spent in fetching stats from server or the table metadata.
Also of note is that the first explain plan results in an EOFException which we end up ignoring.
[~jamestaylor] - should we be just swallowing the exception? Does it point to a bug in our
stats collection or prefix encoding/decoding? I am using the same phoenix connection. EOF
not getting triggered on subsequent executions tells me that there is possibly some stale
state that isn't getting cleared out when we are closing phoenix statements and/or result
sets.

+----------------------------------------------------------------------------------------------------+
|                                                PLAN                                    
           |
+----------------------------------------------------------------------------------------------------+
| CLIENT 75130-CHUNK 7672485 ROWS 2775600674 BYTES SERIAL 1-WAY ROUND ROBIN FULL SCAN OVER
LINEITEM  |
|     SERVER 1 ROW LIMIT                                                                 
           |
| CLIENT 1 ROW LIMIT                                                                     
           |
+----------------------------------------------------------------------------------------------------+
3 rows selected (32.121 seconds)


Subsequent executions of the explain plan/query compilation take lesser time but still in
the order of seconds. 


+----------------------------------------------------------------------------------------------------+
|                                                PLAN                                    
           |
+----------------------------------------------------------------------------------------------------+
| CLIENT 75130-CHUNK 7672485 ROWS 2775600674 BYTES SERIAL 1-WAY ROUND ROBIN FULL SCAN OVER
LINEITEM  |
|     SERVER 1 ROW LIMIT                                                                 
           |
| CLIENT 1 ROW LIMIT                                                                     
           |
+----------------------------------------------------------------------------------------------------+
3 rows selected (4.729 seconds)

I couldn't identify one particular method call in the loop mentioned above that is taking
a long time. It is probably an accumulation of several millisecond calls. Will keep digging.





was (Author: samarthjain):
So I debugged this by adding a bunch of log lines. And it turns out the culprit here isn't
the size of stats cache. The default cache size of is more than enough to fit in all the guide
posts in memory. The below code in BaseResultIterators is taking up majority of the time when
we have large number of guideposts. 

{code}
try {
                    while (guideIndex < gpsSize && (currentGuidePost.compareTo(endKey)
<= 0 || endKey.length == 0)) {
                        Scan newScan = scanRanges.intersectScan(scan, currentKeyBytes, currentGuidePostBytes,
keyOffset,
                                false);
                        if(newScan != null) {
                            ScanUtil.setLocalIndexAttributes(newScan, keyOffset, regionInfo.getStartKey(),
                                regionInfo.getEndKey(), newScan.getStartRow(), newScan.getStopRow());
                        }
                        if (newScan != null) {
                            estimatedRows += gps.getRowCounts().get(guideIndex);
                            estimatedSize += gps.getByteCounts().get(guideIndex);
                        }
                        scans = addNewScan(parallelScans, scans, newScan, currentGuidePostBytes,
false, regionLocation);
                        currentKeyBytes = currentGuidePost.copyBytes();
                        currentGuidePost = PrefixByteCodec.decode(decoder, input);
                        currentGuidePostBytes = currentGuidePost.copyBytes();
                        guideIndex++;
                    }
                } 
catch (EOFException e) {}
{code}

Another interesting thing to note is the first call to explain plan takes way longer than
subsequent ones. Also of note is that the first explain plan also results in an EOFException
which we end up ignoring. @James Taylor - should we be just swallowing the exception? Does
it point to a bug in our stats collection or prefix encoding/decoding? I am using the same
phoenix connection. EOF not getting triggered on subsequent executions tells me that there
is possibly some stale state that isn't getting cleared out when we are closing phoenix statements
and/or result sets.

+----------------------------------------------------------------------------------------------------+
|                                                PLAN                                    
           |
+----------------------------------------------------------------------------------------------------+
| CLIENT 75130-CHUNK 7672485 ROWS 2775600674 BYTES SERIAL 1-WAY ROUND ROBIN FULL SCAN OVER
LINEITEM  |
|     SERVER 1 ROW LIMIT                                                                 
           |
| CLIENT 1 ROW LIMIT                                                                     
           |
+----------------------------------------------------------------------------------------------------+
3 rows selected (32.121 seconds)


Subsequent executions of the explain plan/query compilation take lesser time but still in
the order of seconds. 


+----------------------------------------------------------------------------------------------------+
|                                                PLAN                                    
           |
+----------------------------------------------------------------------------------------------------+
| CLIENT 75130-CHUNK 7672485 ROWS 2775600674 BYTES SERIAL 1-WAY ROUND ROBIN FULL SCAN OVER
LINEITEM  |
|     SERVER 1 ROW LIMIT                                                                 
           |
| CLIENT 1 ROW LIMIT                                                                     
           |
+----------------------------------------------------------------------------------------------------+
3 rows selected (4.729 seconds)

I couldn't identify one particular method call in the loop mentioned above that is taking
a long time. It is probably an accumulation of several millisecond calls. Will keep digging.




> Query with large number of guideposts is slower compared to no stats
> --------------------------------------------------------------------
>
>                 Key: PHOENIX-2724
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2724
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.7.0
>         Environment: Phoenix 4.7.0-RC4, HBase-0.98.17 on a 8 node cluster
>            Reporter: Mujtaba Chohan
>            Assignee: Samarth Jain
>             Fix For: 4.8.0
>
>         Attachments: PHOENIX-2724.patch, PHOENIX-2724_addendum.patch, PHOENIX-2724_v2.patch
>
>
> With 1MB guidepost width for ~900GB/500M rows table. Queries with short scan range gets
significantly slower.
> Without stats:
> {code}
> select * from T limit 10; // query execution time <100 msec
> {code}
> With stats:
> {code}
> select * from T limit 10; // query execution time >20 seconds
> Explain plan: CLIENT 876085-CHUNK 476569382 ROWS 876060986727 BYTES SERIAL 1-WAY FULL
SCAN OVER T SERVER 10 ROW LIMIT CLIENT 10 ROW LIMIT
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message