phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ankit Singhal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-2724) Query with large number of guideposts is slower compared to no stats
Date Wed, 20 Apr 2016 07:28:25 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15249413#comment-15249413
] 

Ankit Singhal commented on PHOENIX-2724:
----------------------------------------

[~samarthjain],

It seems we are flattening nested scans every time but this will impact the performance of
"non-order by" queries on salted and local index table . Where we have to run the queries
in parallel per region for performance if phoenix.query.force.rowkeyorder is set to true or
"Order by" on rowkey. 
Other way, If we want to keep serialIterators as completely serial then we should change isSerial
to exclude salted and localIndex tables.

{code}
+        final List<Scan> flattenedScans = Lists.newArrayListWithExpectedSize(expectedListSize);
+        for (List<Scan> list : nestedScans) {
+            flattenedScans.addAll(list);
+        }
{code}

Is there any reason why we changed this?
{code}
-            }, "Serial scanner for table: " + tableRef.getTable().getPhysicalName().getString()));
+            }, "Serial scanner for table: " + tableRef.getTable().getName().getString()));
{code}

if you are renaming a method, can you please rename the same in QueryUtil.getUnusedOffset
too
{code}
-    public Integer getUnusedOffset() {
+    public Integer getRemainingOffset() {
         return (offset - rowCount) > 0 ? (offset - rowCount) : 0;
     }
{code}


Is this just formatting change?
{code}
-    private void initTableValues(Connection conn) throws SQLException {
-        for (int i = 0; i < 26; i++) {
-            conn.createStatement().execute("UPSERT INTO " + tableName + " values('" + strings[i]
+ "'," + i + ","
-                    + (i + 1) + "," + (i + 2) + ",'" + strings[25 - i] + "')");
-        }
-        conn.commit();
-    }
-
-    private void updateStatistics(Connection conn) throws SQLException {
-        String query = "UPDATE STATISTICS " + tableName + " SET \"" + QueryServices.STATS_GUIDEPOST_WIDTH_BYTES_ATTRIB
-                + "\"=" + Long.toString(500);
-        conn.createStatement().execute(query);
-    }
-
     @Test
     public void testMetaDataWithOffset() throws SQLException {
         Connection conn;
@@ -207,5 +194,19 @@ public class QueryWithOffsetIT extends BaseOwnClusterHBaseManagedTimeIT
{
         ResultSetMetaData md = rs.getMetaData();
         assertEquals(5, md.getColumnCount());
     }
+    
+    private void initTableValues(Connection conn) throws SQLException {
+        for (int i = 0; i < 26; i++) {
+            conn.createStatement().execute("UPSERT INTO " + tableName + " values('" + strings[i]
+ "'," + i + ","
+                    + (i + 1) + "," + (i + 2) + ",'" + strings[25 - i] + "')");
+        }
+        conn.commit();
+    }
+
+    private void updateStatistics(Connection conn) throws SQLException {
+        String query = "UPDATE STATISTICS " + tableName + " SET \"" + QueryServices.STATS_GUIDEPOST_WIDTH_BYTES_ATTRIB
+                + "\"=" + Long.toString(500);
+        conn.createStatement().execute(query);
+    }
{code}

> Query with large number of guideposts is slower compared to no stats
> --------------------------------------------------------------------
>
>                 Key: PHOENIX-2724
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2724
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.7.0
>         Environment: Phoenix 4.7.0-RC4, HBase-0.98.17 on a 8 node cluster
>            Reporter: Mujtaba Chohan
>            Assignee: Samarth Jain
>             Fix For: 4.8.0
>
>         Attachments: PHOENIX-2724.patch
>
>
> With 1MB guidepost width for ~900GB/500M rows table. Queries with short scan range gets
significantly slower.
> Without stats:
> {code}
> select * from T limit 10; // query execution time <100 msec
> {code}
> With stats:
> {code}
> select * from T limit 10; // query execution time >20 seconds
> Explain plan: CLIENT 876085-CHUNK 476569382 ROWS 876060986727 BYTES SERIAL 1-WAY FULL
SCAN OVER T SERVER 10 ROW LIMIT CLIENT 10 ROW LIMIT
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message