phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas D'Silva (JIRA)" <>
Subject [jira] [Commented] (PHOENIX-2565) Store data for immutable tables in single KeyValue
Date Fri, 15 Jul 2016 02:14:20 GMT


Thomas D'Silva commented on PHOENIX-2565:


I didn't add a design doc because we were planning on enabling this feature only for immutable
tables and using the existing array serialization format, so the implementation seemed straightforward.

All column values for a given column family are stored in a single KeyValue. A new StorageScheme
(to be added as part of PHOENIX-1598) COLUMNS_STORED_IN_SINGLE_CELL is used to denote a table
with columns stored in this format. Existing tables will have a StorageSchema of NON_ENCODED_COLUMN
names and will work as before.  Once a table is stored with the COLUMNS_STORED_IN_SINGLE_CELL
storage scheme you cannot transition a table to/from being immutable.

The existing serialization format used to store arrays (see PArrayDataType) will be used to
serialize multiple columns into a single byte[]. An ArrayConstructor Expression will be constructed
with the column values as LiteralExpressions and evaluated to generate the byte array.
A new column expression ArrayColumnExpression that stores the index at which the column is
stored in the array will be used instead of KeyValueColumn expression. The getEncodedColumnQualifier()
method of PColumn (to be added as part of PHOENIX-1598) will be used for the index. 

The remaining changes involved handling the new ArrayColumnExpression where previously we
only used a KeyValueColumnExpression (for example in WhereCompiler.setScanFilter()). Currently
when a column is deleted we don't remove the entry from the array as this would involve rewriting
all KeyValues. We were thinking of investigating whether we could remove the deleted column
values from the array during compaction.

[~jamestaylor] what do you think about allowing users to specify a subset of columns that
are stored together in single KeyValue?


> Store data for immutable tables in single KeyValue
> --------------------------------------------------
>                 Key: PHOENIX-2565
>                 URL:
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Thomas D'Silva
>             Fix For: 4.9.0
>         Attachments: PHOENIX-2565-wip.patch
> Since an immutable table (i.e. declared with IMMUTABLE_ROWS=true) will never update a
column value, it'd be more efficient to store all column values for a row in a single KeyValue.
We could use the existing format we have for variable length arrays.
> For backward compatibility, we'd need to support the current mechanism. Also, you'd no
longer be allowed to transition an existing table to/from being immutable. I think the best
approach would be to introduce a new IMMUTABLE keyword and use it like this:
> {code}
> {code}

This message was sent by Atlassian JIRA

View raw message