phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-3271) Distribute UPSERT SELECT across cluster
Date Tue, 10 Jan 2017 09:51:58 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15814517#comment-15814517
] 

Hadoop QA commented on PHOENIX-3271:
------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12846534/PHOENIX-3271_v3.patch
  against master branch at commit d8f4594989c0b73945aaffec5649a0b62ac59724.
  ATTACHMENT ID: 12846534

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include any new or modified
tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 43 warning messages.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:red}-1 lineLengths{color}.  The patch introduces the following lines longer than
100:
    +        String ddl = "CREATE TABLE " + tableName1 + " (K BIGINT NOT NULL PRIMARY KEY
ROW_TIMESTAMP, V VARCHAR)"
+        ddl = "CREATE TABLE " + tableName2 + " (K BIGINT NOT NULL PRIMARY KEY ROW_TIMESTAMP,
V VARCHAR)"
+                        projectedColumns.add(column.getPosition() == i + posOff ? column
: new PColumnImpl(column, i));
+                    final QueryPlan aggPlan = new AggregatePlan(context, select, statementContext.getCurrentTable(),
aggProjector, null,null, OrderBy.EMPTY_ORDER_BY, null, GroupBy.EMPTY_GROUP_BY, null);
+    private void commitBatchWithHTable(HTable table, Region region, List<Mutation>
mutations, byte[] indexUUID,
+            long blockingMemstoreSize, byte[] indexMaintainersPtr, byte[] txState) throws
IOException {
+            // Need to add indexMaintainers for each mutation as table.batch can be distributed
across servers
+            targetHTable = new HTable(env.getConfiguration(), projectedTable.getPhysicalName().getBytes());
+                    region.getTableDesc().getTableName().getName()) == 0 && projectedTable.getRowTimestampColPos()
== -1
+                                commit(region, mutations, indexUUID, blockingMemStoreSize,
indexMaintainersPtr, txState,

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
     

Test results: https://builds.apache.org/job/PreCommit-PHOENIX-Build/730//testReport/
Javadoc warnings: https://builds.apache.org/job/PreCommit-PHOENIX-Build/730//artifact/patchprocess/patchJavadocWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-PHOENIX-Build/730//console

This message is automatically generated.

> Distribute UPSERT SELECT across cluster
> ---------------------------------------
>
>                 Key: PHOENIX-3271
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3271
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Ankit Singhal
>             Fix For: 4.10.0
>
>         Attachments: PHOENIX-3271.patch, PHOENIX-3271_v1.patch, PHOENIX-3271_v2.patch,
PHOENIX-3271_v3.patch
>
>
> Based on some informal testing we've done, it seems that creation of a local index is
orders of magnitude faster that creation of global indexes (17 seconds versus 10-20 minutes
- though more data is written in the global index case). Under the covers, a global index
is created through the running of an UPSERT SELECT. Also, UPSERT SELECT provides an easy way
of copying a table. In both of these cases, the data being upserted must all flow back to
the same client which can become a bottleneck for a large table. Instead, what can be done
is to push each separate, chunked UPSERT SELECT call out to a different region server for
execution there. One way we could implement this would be to have an endpoint coprocessor
push the chunked UPSERT SELECT out to each region server and return the number of rows that
were upserted back to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message