phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PHOENIX-1578) Support explicit storage of null values
Date Tue, 13 Jan 2015 19:10:34 GMT

     [ https://issues.apache.org/jira/browse/PHOENIX-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gabriel Reid updated PHOENIX-1578:
----------------------------------
    Attachment: PHOENIX-1578-docs.patch
                PHOENIX-1578.2.patch

Thanks for the pointers [~jamestaylor].

Here's an updated patch (as well as a documentation patch for the website) with the MetaDataProtocol.MIN_SYSTEM_TABLE_TIMESTAMP
incremented and the addition of the STORE_NULLS column as part of the auto-upgrade path.

{quote}One addition that would be nice IMO to your patch is to provide a config property that
controls the default value of STORE_NULLS. In that way, a new installation could set that
to true and not have to remember to always include it in CREATE TABLE calls, and existing
installations could adopt it also without calling ALTER TABLE on all existing tables. Perhaps
the config property would just control the value that gets set in PTableImpl by default for
storeNulls?{quote}

In this patch I've added a config parameter (phoenix.table.default.store.nulls) to set the
default value of the STORE_NULLS flag at table creation time. 

However, if I'm understanding your suggestion correctly, you were saying that it would be
good to have this setting alter the behavior of existing tables as well, which sounds to me
like it could be problematic. If this is purely a config setting (and not strictly set in
the catalog table), then all it would take is one person connecting with an out-of-sync config
file and setting a field to null, and that would wipe out the history of that field for good.
It seems better (or at least acceptable) to me that existing installations would need to explicitly
issue an ALTER TABLE statement in order to adopt this behavior, as opposed to making sure
that this setting is synced over all config files. What do you think?


> Support explicit storage of null values
> ---------------------------------------
>
>                 Key: PHOENIX-1578
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1578
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Gabriel Reid
>            Assignee: Gabriel Reid
>         Attachments: PHOENIX-1578-docs.patch, PHOENIX-1578.2.patch, PHOENIX-1578.patch
>
>
> Null values are currently represented implicitly by a lack of a KeyValue for a given
field. This is implemented by using an HBase delete to remove cells when a given field is
set to null via an upsert statement.
> However, this method of setting values to null causes all previous versions of the given
field to be removed on the next major compaction, which prevents doing flashback queries for
the given field.
> One workaround for this is to enable KEEP_DELETED_CELLS on the underlying HBase table
-- however, this means that SQL deletes (i.e. DELETE FROM TABLE) will never actually remove
the data.
> This ticket is to propose a flag (defined at table level) which specifies that null values
to be explicitly stored in HBase. This flag should not change the behavior of a SQL {{DELETE}}
statement, i.e. a SQL {{DELETE}} will still cause a record to be permanently deleted (including
historical data).
> The use of this flag in combination with KEEP_DELETED_CELLS=false and VERSIONS=unlimited
will allow Phoenix to provide true row-level versioning.
> Additional background in this mail thread: http://s.apache.org/kwz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message