phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-2565) Store data for immutable tables in single KeyValue
Date Wed, 04 Jan 2017 01:23:58 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796808#comment-15796808
] 

James Taylor commented on PHOENIX-2565:
---------------------------------------

Also, this format is optimized for dense data. See PHOENIX-3559. I'm not sure we'll find a
serialization format that's good for both dense and sparse storage, IMHO it's ok to optimize
for dense storage provided we support plugging in other storage formats optimized in other
dimensions.

bq.  I'm not sure why we can't just concatenate the bytes with a delimiter (including special
encoding for null and tracking of a length of fixed width datatype by schema).
This is more or less the format we use for the bytes that make up the row key. There are limitations
in that a VARBINARY and an ARRAY may only appear at the end of the row key since there's no
delimiter byte that we can count on not appearing in the data. You'd also need to walk through
the bytes to get to the start of the column data (which would get slower and slower as the
number of columns increase). The new format allows you to look up the byte offset via an array
lookup so it's pretty fast. We also don't need to store any separator bytes.


> Store data for immutable tables in single KeyValue
> --------------------------------------------------
>
>                 Key: PHOENIX-2565
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2565
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Thomas D'Silva
>         Attachments: PHOENIX-2565-v2.patch, PHOENIX-2565-wip.patch, PHOENIX-2565.patch
>
>
> Since an immutable table (i.e. declared with IMMUTABLE_ROWS=true) will never update a
column value, it'd be more efficient to store all column values for a row in a single KeyValue.
We could use the existing format we have for variable length arrays.
> For backward compatibility, we'd need to support the current mechanism. Also, you'd no
longer be allowed to transition an existing table to/from being immutable. I think the best
approach would be to introduce a new IMMUTABLE keyword and use it like this:
> {code}
> CREATE IMMUTABLE TABLE ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message