hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Blue <rb...@cloudera.com>
Subject Re: [common type encoding breakout] Re: HBase Hackathon @ Salesforce 05/06/2014 notes
Date Thu, 15 May 2014 18:54:19 GMT
On 05/15/2014 09:32 AM, James Taylor wrote:
> @Ryan & Jon - thanks again for pursuing this - I think it'll be a big
> improvement.
> IMHO, it'd be good to add a Requirements section to the doc. If the
> current Phoenix type system meets those requirements, then why not just
> go with that?

Good idea. Part of the problem has been that we don't all have a clear 
picture of goals. Places where I think we need to come up with answers:

1. Are we targeting a backward-compatible encoding that can be used on 
existing tables?

   My answer: No, because this would dramatically increase the required 
size of implementations. Supporting existing Phoenix tables (and the 
UNSIGNED types) should be a separate issue. Also: as the experts in 
using the current Phoenix encoding, what would you like to fix?

2. Are we going to include choices for encoding for specific types, or 
are we going to choose one?

   My answer: Choose one. This is what the DataType (or similar) APIs 
are for. This is just one encoding spec and there can be more.

Let's talk about these today, as well as some of the trade-offs of the 
Phoenix encoding to figure out those requirements. It is very similar to 
the proposed encoding, except that VARCHAR and BINARY are treated 
differently and the additional tracking bytes in the key are type 
ordinals and not field position-based tags. Basically, can we live with 
variable-length binary only at the end of the key, or do we need a 
requirement that it can be any field?

> I think we need a binary serialization spec that includes compound keys
> in the row key plus all the SQL primitive data types that we want to

I'm not sure I understand. What does the current spec not support that 
it should?

> support (minimally all the SQL types that Phoenix currently supports).

I agree. The current spec supports all of the current Phoenix types, 
minus the backward-compatible types based on Bytes. If there are types 
missing from the list at the end of the doc, please add them or tell me 
which ones so that I can.

I also clarified in the doc why there are few memcmp encodings, but this 
does not limit the types in the spec. Is this clear enough?

For the UNSIGNED Bytes types, I'm fine adding them if we need to for 
backward-compatibility. This comes down to whether this encoding is 
going to be used along-side existing data in the same table or if it 
will be a new table format.


Ryan Blue
Software Engineer
Cloudera, Inc.

View raw message