cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Louay Kamel (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-15096) [RFC CQL v4+] cql_extension: wide range of unset_values.
Date Mon, 22 Apr 2019 22:17:00 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-15096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Louay Kamel updated CASSANDRA-15096:
------------------------------------
    Description: 
 
 *Problem*

The current implementation of unset_value regularly fails (see Issues).
 We need to implement a new unset_value(s) mechanism which is robust and will work well for
v4+ protocols.

*Issues*

+1- A client has to encode unset_value for all the columns+
 +in an insert-prepared query values.+

example: INSERT INTO table(pkey,ckey,col1,col2,col3,col4) values(?,?,?,?,?,?);

An execute query should unset all the columns one by one by encoding unset_value as "int(-2)"

a- binded-values = (pkey_value, ckey_value, col1_value, unset_value, unset_value, unset_value)
or
 b- binded-values = (pkey_value, ckey_value, unset_value, unset_value, unset_value, col4_value)
etc.

this increase the execute query binary buffer which is in term increase the bandwidth and
latency for both request/response.

+2- Returning Select-queries buffer not differentiate between null and unset_value for a subset
of given rows.+

example:
 imagine you have a dataset in the table where each row of the returning select response have
different
 unset/null columns, consider the following query:
 SELECT * FROM table where pkey = pkey_value;
 and with a page_size = 3 rows ,

 
||pkey||ckey||col1||col2||col3||col4||
|pkey_value|ckey_value|col1_value|null/unset_value|null/unset_value|null/unset_value|
|pkey_value|ckey_value|null/unset_value|null/unset_value|null/unset_value|col4_value|
|pkey_value|ckey_value|null/unset_value|null/unset_value|col3_value|null/unset_value|

 

*Proposed solution*

Instead of just having null(-1) and unset_value(-2), extending the unset_value(s)
 to a range from unset_(-2) to unset_(-2,147,483,648),
 where unset_value = unset_(-2)
 unset_rest = unset_(-2,147,483,648)
 anything in between will be unset_(neg_integer).

+Solution for issue_1:+

a- binded-values = (pkey_value, ckey_value, col1_value, unset_rest)
 b- binded-values = (pkey_value, ckey_value, unset_(-4), col4_value)

+Solution for issue_2:+

work with all select-un/prepared responses.

row1 buffer -> pkey_value, ckey_value, col1_value, unset_rest.
 this will enable the buffer to shift to a new row.

row2 buffer -> pkey_value, ckey_value, unset_(-4), col4_value.
 this will enable the buffer to skip the columns metadata -4+1=-3 columns and start decoding
from col4 for the next cell_value in the row.

row3 buffer -> pkey_value, ckey_value, unset_(-3), col3_value, unset_rest.
 this buffer is a mix of row1/row2.

this solution not limited to unset_(neg-int) , it can be used on null cell responses to decrease
the bandwidth between CQL and client.

to be compatible with all the current v4+ cql/drivers, we should force the client to send
a flag with the select query request (either in the frame-header or somewhere in the cql statement),
 and for returning buffer we could use the rows flags (ex, has_unset_values?: boolean) to
let the driver know if it exist in the page.

*Benefits*

-implementing this will enable apps to design complex data-model up to 2 billion columns without
trading off anything.

-reducing the number of write-prepared statements in datamodel with millions of columns to
a highest degree.

-huge impact on the bandwidth/cpu-cycles.
  
 -easy to implement in the client side.

*Record of votes*

+1 Louay Kamel

  was:
 
 
*Problem*

 

The current implementation of unset_value regularly fails (see Issues).
 We need to implement a new unset_value(s) mechanism which is robust and will work well for
v4+ protocols.

*Issues*

+1- A client has to encode unset_value for all the columns+
 +in an insert-prepared query values.+

example: INSERT INTO table(pkey,ckey,col1,col2,col3,col4) values(?,?,?,?,?,?);

An execute query should unset all the columns one by one by encoding unset_value as "int(-2)"

a- binded-values = (pkey_value, ckey_value, col1_value, unset_value, unset_value, unset_value)
or
 b- binded-values = (pkey_value, ckey_value, unset_value, unset_value, unset_value, col4_value)
etc.

this increase the execute query binary buffer which is in term increase the bandwidth and
latency for both request/response.

+2- Returning Select-queries buffer not differentiate between null and unset_value for a subset
of given rows.+

example:
 imagine you have a dataset in the table where each row of the returning select response have
different
 unset/null columns, consider the following query:
 SELECT * FROM table where pkey = pkey_value;
 and with a page_size = 3 rows ,

 
||pkey||ckey||col1||col2||col3||col4||
|pkey_value|ckey_value|col1_value|null/unset_value|null/unset_value|null/unset_value|
|pkey_value|ckey_value|null/unset_value|null/unset_value|null/unset_value|col4_value|
|pkey_value|ckey_value|null/unset_value|null/unset_value|col3_value|null/unset_value|

 

 

*Proposed solution*

Instead of just having null(-1) and unset_value(-2), extending the unset_value(s)
 to a range from unset_(-2) to unset_(-2,147,483,648),
 where unset_value = unset_(-2)
 unset_rest = unset_(-2,147,483,648)
 anything in between will be unset_(neg_integer).

+Solution for issue_1:+

a- binded-values = (pkey_value, ckey_value, col1_value, unset_rest)
 b- binded-values = (pkey_value, ckey_value, unset_(-4), col4_value)

+Solution for issue_2:+

work with all select-un/prepared responses.

row1 buffer -> pkey_value, ckey_value, col1_value, unset_rest.
 this will enable the buffer to shift to a new row.

row2 buffer -> pkey_value, ckey_value, unset_(-4), col4_value.
 this will enable the buffer to skip the columns metadata -4+1=-3 columns and start decoding
from col4 for the next cell_value in the row.

row3 buffer -> pkey_value, ckey_value, unset_(-3), col3_value, unset_rest.
 this buffer is a mix of row1/row2.

this solution not limited to unset_(neg-int) , it can be used on null cell responses to decrease
the bandwidth between CQL and client.

to be compatible with all the current v4+ cql/drivers, we should force the client to send
a flag with the select query request (either in the frame-header or somewhere in the cql statement),
 and for returning buffer we could use the rows flags (ex, has_unset_values?: boolean) to
let the driver know if it exist in the page.

*Benefits*

-implementing this will enable apps to design complex data-model up to 2 billion columns without
trading off anything.

-reducing the number of write-prepared statements in datamodel with millions of columns to
a highest degree.

-huge impact on the bandwidth/cpu-cycles.
  
 -easy to implement in the client side.

*Record of votes*

+1 Louay Kamel


> [RFC CQL v4+] cql_extension: wide range of unset_values.
> --------------------------------------------------------
>
>                 Key: CASSANDRA-15096
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15096
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL/Interpreter, CQL/Semantics
>            Reporter: Louay Kamel
>            Priority: High
>
>  
>  *Problem*
> The current implementation of unset_value regularly fails (see Issues).
>  We need to implement a new unset_value(s) mechanism which is robust and will work well
for v4+ protocols.
> *Issues*
> +1- A client has to encode unset_value for all the columns+
>  +in an insert-prepared query values.+
> example: INSERT INTO table(pkey,ckey,col1,col2,col3,col4) values(?,?,?,?,?,?);
> An execute query should unset all the columns one by one by encoding unset_value as "int(-2)"
> a- binded-values = (pkey_value, ckey_value, col1_value, unset_value, unset_value, unset_value)
or
>  b- binded-values = (pkey_value, ckey_value, unset_value, unset_value, unset_value, col4_value)
etc.
> this increase the execute query binary buffer which is in term increase the bandwidth
and latency for both request/response.
> +2- Returning Select-queries buffer not differentiate between null and unset_value for
a subset of given rows.+
> example:
>  imagine you have a dataset in the table where each row of the returning select response
have different
>  unset/null columns, consider the following query:
>  SELECT * FROM table where pkey = pkey_value;
>  and with a page_size = 3 rows ,
>  
> ||pkey||ckey||col1||col2||col3||col4||
> |pkey_value|ckey_value|col1_value|null/unset_value|null/unset_value|null/unset_value|
> |pkey_value|ckey_value|null/unset_value|null/unset_value|null/unset_value|col4_value|
> |pkey_value|ckey_value|null/unset_value|null/unset_value|col3_value|null/unset_value|
>  
> *Proposed solution*
> Instead of just having null(-1) and unset_value(-2), extending the unset_value(s)
>  to a range from unset_(-2) to unset_(-2,147,483,648),
>  where unset_value = unset_(-2)
>  unset_rest = unset_(-2,147,483,648)
>  anything in between will be unset_(neg_integer).
> +Solution for issue_1:+
> a- binded-values = (pkey_value, ckey_value, col1_value, unset_rest)
>  b- binded-values = (pkey_value, ckey_value, unset_(-4), col4_value)
> +Solution for issue_2:+
> work with all select-un/prepared responses.
> row1 buffer -> pkey_value, ckey_value, col1_value, unset_rest.
>  this will enable the buffer to shift to a new row.
> row2 buffer -> pkey_value, ckey_value, unset_(-4), col4_value.
>  this will enable the buffer to skip the columns metadata -4+1=-3 columns and start decoding
from col4 for the next cell_value in the row.
> row3 buffer -> pkey_value, ckey_value, unset_(-3), col3_value, unset_rest.
>  this buffer is a mix of row1/row2.
> this solution not limited to unset_(neg-int) , it can be used on null cell responses
to decrease the bandwidth between CQL and client.
> to be compatible with all the current v4+ cql/drivers, we should force the client to
send a flag with the select query request (either in the frame-header or somewhere in the
cql statement),
>  and for returning buffer we could use the rows flags (ex, has_unset_values?: boolean)
to let the driver know if it exist in the page.
> *Benefits*
> -implementing this will enable apps to design complex data-model up to 2 billion columns
without trading off anything.
> -reducing the number of write-prepared statements in datamodel with millions of columns
to a highest degree.
> -huge impact on the bandwidth/cpu-cycles.
>   
>  -easy to implement in the client side.
> *Record of votes*
> +1 Louay Kamel



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message