phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-1498) Turn KEEP_DELETED_CELLS off by default
Date Sun, 07 Dec 2014 05:46:12 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237063#comment-14237063
] 

James Taylor commented on PHOENIX-1498:
---------------------------------------

Thanks for the patch, [~jeffreyz]. Couple of comments/questions:
- We need to set KEEP_DELETED_CELLS=true for SYSTEM.CATALOG, SYSTEM.SEQUENCE, and SYSTEM.STATS
- just add it to the DDL statements in QueryConstants. Otherwise, tables that want to use
KEEP_DELETED_CELLS will no longer maintain the proper versions for schema changes.
- Tell me more about the test changes, in particular why the system tables need to be deleted
after each test suite? If you do the above, is that no longer necessary?
{code}
+++ phoenix-core/src/it/java/org/apache/phoenix/end2end/BaseClientManagedTimeIT.java
@@ -74,6 +74,6 @@ public abstract class BaseClientManagedTimeIT extends BaseTest {
     
     @AfterClass
     public static void doTeardown() throws Exception {
-        dropNonSystemTables();
+        dropAllTables();
     }
 }
{code}
- No need to initialize props using getDefaultProps(), as the overrides are combined with
the defaults. Also, any reason why these are all added here? I suppose they were getting set
on the super class before?
{code}
 public class QueryIT extends BaseQueryIT {
     
+    @BeforeClass
+    @Shadower(classBeingShadowed = BaseQueryIT.class)
+    public static void doSetup() throws Exception {
+        Map<String,String> props = getDefaultProps();
+        props.put(QueryServices.QUEUE_SIZE_ATTRIB, Integer.toString(5000));
+        props.put(IndexWriterUtils.HTABLE_THREAD_KEY, Integer.toString(100));
+        // Make a small batch size to test multiple calls to reserve sequences
+        props.put(QueryServices.SEQUENCE_CACHE_SIZE_ATTRIB, Long.toString(BATCH_SIZE));
+        props.put(QueryServices.DEFAULT_KEEP_DELETED_CELLS_ATTRIB, "true");
+        // Must update config before starting server
+        setUpTestDriver(new ReadOnlyProps(props.entrySet().iterator()));
+    }
+
{code}
- The code here hardcodes that by default KEEP_DELETED_CELLS is not set. IMO, it'd be a little
cleaner to declare in QueryServicesOptions a public static final boolean DEFAULT_KEEP_DELETED_CELLS
= false and then just set the value like this: columnDesc.setKeepDeletedCells(props.getBoolean(QueryServices.DEFAULT_KEEP_DELETED_CELLS_ATTRIB,
QueryServicesOptions.DEFAULT_KEEP_DELETED_CELLS));
{code}
diff --git phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java
phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java
     private HColumnDescriptor generateColumnFamilyDescriptor(Pair<byte[],Map<String,Object>>
family, PTableType tableType) throws SQLException {
         HColumnDescriptor columnDesc = new HColumnDescriptor(family.getFirst());
         if (tableType != PTableType.VIEW) {
-            columnDesc.setKeepDeletedCells(true);
+            if(props.get(QueryServices.DEFAULT_KEEP_DELETED_CELLS_ATTRIB) != null){
+                columnDesc.setKeepDeletedCells(props.getBoolean(
+                    QueryServices.DEFAULT_KEEP_DELETED_CELLS_ATTRIB, false));
+            }
{code}
- Is it necessary to set the queue size depth here?
{code}
public class GroupByIT extends BaseQueryIT {
 
+    @BeforeClass
+    @Shadower(classBeingShadowed = BaseQueryIT.class)
+    public static void doSetup() throws Exception {
+        Map<String,String> props = getDefaultProps();
+        props.put(QueryServices.QUEUE_SIZE_ATTRIB, Integer.toString(5000));
{code}

> Turn KEEP_DELETED_CELLS off by default
> --------------------------------------
>
>                 Key: PHOENIX-1498
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1498
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.0.0, 5.0.0
>            Reporter: Jeffrey Zhong
>            Assignee: Jeffrey Zhong
>         Attachments: PHOENIX-1498-v2.patch, PHOENIX-1498.patch
>
>
> Phoenix table is created with "KEEP_DELETED_CELLS" enabled by default, this is only used
to allow for flashback queries to work correctly. While flashback query isn't used often in
field and we found that query performance degraded with the option on. This is likely a hbase
scan issue though(will create a JIRA once having more info). 
> Anyway Keeping deleted cells will add performance penalty and it's not used often. Therefore,
I'm suggesting to set it off by default. 
> We have a test where a table is loaded with > 5m rows and then some are deleted/reinserted.
The count ( * ) performance became worse & worse:
> {code}
> +------------+
> |  COUNT(1)  |
> +------------+
> | 5078242    |
> +------------+
> 1 row selected (33.273 seconds)
> +------------+
> |  COUNT(1)  |
> +------------+
> | 5078242    |
> +------------+
> 1 row selected (174.771 seconds)
> +------------+
> |  COUNT(1)  |
> +------------+
> | 5078242    |
> +------------+
> 1 row selected (458.251 seconds)
> {code}
> I think we can provide a table property in CREATE TABLE & ALTER TABLE statement for
people to enable KEEP_DELETED_CELLS if there is a need but by default it should be turned
off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message