kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Delete row by partial key
Date Thu, 23 Feb 2017 07:18:41 GMT
On Thu, Feb 16, 2017 at 8:36 AM, Paul Brannan <paul.brannan@thesystech.com>

> Thank you for the quick response!
> I do understand the desire to not present in the API the appearance of a
> feature that isn't really there.  As an example, I was reading KUDU-1291
> yesterday and realized that since predicates are specified by name and not
> column index, it's easy to accidentally construct an inefficient scan by
> omitting the first component of the key.  It's not surprising given how
> kudu works, but it's not obvious just by looking at the API.
> I would like to contribute a patch, though I would like to learn more
> about kudu before I feel comfortable.  In this case, for example, my naive
> solution may work, but isn't very robust -- since it has to switch on the
> data type of each cell, it will fail when new data types are added.

I don't think that's too bad - we have lots of places where we have to add
a new case label when a new type is added, so what's one more? :)

>   I have a hunch it's possible to copy row_data_ directly somehow, but I
> have no idea how to correctly set isset_bitmap_ or owned_strings_bitmap_.

"isset_bitmap_" should probably be set based on the projected columns of
the scanner.
"owned_stirngs_bitmap_" indicates which strings in the row are "owned" by
the PartialRow rather than "referenced". I think to make this API safe,
we'd probably want to clone all of the string data, which would imply
setting owned_strings_bitmap_ to 1 bits for those strings.

Hope that helps

> On Wed, Feb 15, 2017 at 1:32 PM, Todd Lipcon <todd@cloudera.com> wrote:
>> Hi Paul,
>> You're correct that there is no predicate-based delete currently
>> available, so you have to scan and then feed the results of the scan back
>> into your desired mutations/deletes. This is intentional, since right now
>> we don't have multi-row transactional capabilities, and a "delete by
>> predicate" API would probably give the false impression that it is
>> transactional.
>> I also think you're right that there isn't a nice way of propagating a
>> RowResult into a PartialRow as you need to do here. It seems you're a C++
>> programmer -- any interest in contributing a patch to make this
>> transformation a bit easier?
>> -Todd
>> On Wed, Feb 15, 2017 at 10:27 AM, Paul Brannan <
>> paul.brannan@thesystech.com> wrote:
>>> I want to delete all rows that match a particular partial key.  For
>>> example, if my schema includes columns "foo", "bar", and "baz" in its
>>> primary key, I want to be able to delete all rows with "foo=16" and
>>> "bar=32", regardless of the value of baz.  If I attempt to apply a
>>> KuduDelete without specifying "baz", I get an error "Illegal state: Key not
>>> specified".
>>> The best I have come up with so far is to do a scan and copy the data
>>> cell-by-cell from the RowPtr returned by the scan into the KuduPartialRow
>>> used by the delete; I don't see any good way in the interface to copy row
>>> data from one to the other without copying cell-by-cell.  The code looks
>>> something like:
>>>   for (auto idx : primary_key_column_indexes) {
>>>     switch(schema.Column(idx).type()) {
>>>     case KuduColumnSchema::INT16: // GetInt16/SetInt16
>>>     case KuduColumnSchema::INT32: // GetInt32/SetInt32
>>>     case KuduColumnSchema::STRING: // GetString/SetString
>>>     // and so on...
>>>     }
>>>   }
>>> Is there a better way?
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera

Todd Lipcon
Software Engineer, Cloudera

View raw message