phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (PHOENIX-4701) Declare SYSTEM.LOG table as immutable with compact storage format
Date Mon, 23 Apr 2018 17:39:01 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448519#comment-16448519
] 

James Taylor edited comment on PHOENIX-4701 at 4/23/18 5:38 PM:
----------------------------------------------------------------

Also, I don't think a PK of only QUERY_ID is particularly useful. It should be part of the
PK (for uniqueness), but at the end as we won't every query by QUERY_ID). What would be
the most common query against the table? That'll drive what the PK should be. Perhaps a PK
of (START_TIME, TOTAL_EXECUTION_TIME, QUERY_ID). We'd want to salt the table as well to prevent
write hotspotting. This would let us efficiently query for queries that occurred within a
given time range that were slow. For example:
{code:java}
SELECT * FROM SYSTEM.LOG WHERE START_TIME > CURRENT_DATE()-1.0/24.0 AND START_TIME <
CURRENT_DATE() AND TOTAL_EXECUTION_TIME > 1000;{code}
Without a better PK, the above would be full table scan.

Also, START_TIME should be declared as a DATE, not a TIMESTAMP. TIMESTAMP is nanosecond granularity
which we don't need (and wouldn't capture anyway) and it'd cause more overhead. DATE is millisecond
granularity which is what we'd want.


was (Author: jamestaylor):
Also, I don't think a PK of only QUERY_ID is particularly useful. It should be part of the
PK (for uniqueness), but at the end as we won't every query by QUERY_ID). What would be
the most common query against the table? That'll drive what the PK should be. Perhaps a PK
of (TOTAL_EXECUTION_TIME, START_TIME, QUERY_ID).

Also, START_TIME should be declared as a DATE, not a TIMESTAMP. TIMESTAMP is nanosecond granularity
which we don't need (and wouldn't capture anyway) and it'd cause more overhead. DATE is millisecond
granularity which is what we'd want.

> Declare SYSTEM.LOG table as immutable with compact storage format
> -----------------------------------------------------------------
>
>                 Key: PHOENIX-4701
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4701
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: James Taylor
>            Priority: Major
>             Fix For: 4.14.0, 5.0.0
>
>
> If possible, the SYSTEM.LOG table would benefit greatly  (3-5x perf gain) from being
declared as immutable with a column encoding of 1 byte and a storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message