phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-2565) Store data for immutable tables in single KeyValue
Date Wed, 04 Jan 2017 21:51:58 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799459#comment-15799459
] 

Enis Soztutar commented on PHOENIX-2565:
----------------------------------------

>From the experience of trying to use this for billions of rows and hundreds of columns
(where the schema is a regular RDBMS one), there are a couple of problems that the array encoding
has in terms of packing data efficiently. 
 - Array encoding uses all three of separators, and offsets / lengths, as well as nullability
encoding. This means that there is a lot of unnecessary overhead for representing repetitive
information. 
 - Run-length encoding-like null representation gets really expensive, if you have data like
{{a, <null>, b, <null>, c, <null>}}. A simple bitset is easier and more
efficien. Or, if you are already encoding the offsets, you do not have to re-encode nullability.
If offset_i and offset_i+1 are equal, the field is null.   
 - The offsets are 4 or 2 bytes fixed length, not using varint encoding. This makes a difference
for majority of data where expected num columns is <128. 

I think array encoding is this way because arrays can be part of the row key. However, for
packing column values, we do not need the lexicographic sortable guarantee, meaning that we
can do a way better job than the array encoding. The way forward for this I think is to leave
the array encoding as it is, but instead do a PStructDataType that implements the new scheme.


This is the exact problem that avro / PB and Thrift encodings solve already. However, the
requirements are a little different for phoenix. 
 - First, we have to figure out how we are gonna deal with schema evolution. 
 - We need efficient way to access individual fields within the byte array without deserializing
the whole byte[] (although notice that it is already read from disk and in-memory).
 - Nullability support. 
Looking at this, I think something like Flatbuffers / Capn proto looks more like the direction
(especially with the requirement that we do not want to deserialize the whole thing). 

If we want to do a custom format with the given encodings, I think we can do something like
this: 
{code}
<format_id><column_1><column_2>...<column_n> <offset_1><offset_2><offset_3><offset_start>
{code}
where 
 - {{format_id}}       : single byte showing the format of the data, 
 - {{column_n}}      : column data, NO separators 
 - {{offset_n}}         : byte offset of the nth column. It can be varint, if we can cache
this data. Otherwise, can make this 1/2/4 bytes and encode that information at the tail. 
 - {{offset_start}}    : this is the offset of <offset_1>. The reader can find and cache
how many columns are there in the encoded data by reading all of the offsets. Notice that
we can only add columns to an existing table, and the schema is still in the catalog table.
Columns not used anymore are always null. 
To read a column, you would find the offset of the column, and the length would be {{offset_n+1}}
- {{offset_n}}. If a column is null, it is always encoded as 0 bytes, and {{offset_n+1}} would
be equal to {{offset_n}}. 




 




> Store data for immutable tables in single KeyValue
> --------------------------------------------------
>
>                 Key: PHOENIX-2565
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2565
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Thomas D'Silva
>         Attachments: PHOENIX-2565-v2.patch, PHOENIX-2565-wip.patch, PHOENIX-2565.patch
>
>
> Since an immutable table (i.e. declared with IMMUTABLE_ROWS=true) will never update a
column value, it'd be more efficient to store all column values for a row in a single KeyValue.
We could use the existing format we have for variable length arrays.
> For backward compatibility, we'd need to support the current mechanism. Also, you'd no
longer be allowed to transition an existing table to/from being immutable. I think the best
approach would be to introduce a new IMMUTABLE keyword and use it like this:
> {code}
> CREATE IMMUTABLE TABLE ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message