kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Brannan <paul.bran...@thesystech.com>
Subject Re: Delete row by partial key
Date Fri, 24 Feb 2017 17:17:33 GMT
I think this makes sense for isset_bitmap_ and owned_strings_bitmap_.  I
have the source checked out from github and will try to build and run the
regression tests so I can play with this idea.

In general though it seems I only want to copy the primary keys (any given
RowPtr might have other columns).  I think I can come up with a way to do
that but it will take some more learning...

Thanks for the help!


On Thu, Feb 23, 2017 at 2:18 AM, Todd Lipcon <todd@cloudera.com> wrote:

> On Thu, Feb 16, 2017 at 8:36 AM, Paul Brannan <paul.brannan@thesystech.com
> > wrote:
>
>> Thank you for the quick response!
>>
>> I do understand the desire to not present in the API the appearance of a
>> feature that isn't really there.  As an example, I was reading KUDU-1291
>> yesterday and realized that since predicates are specified by name and not
>> column index, it's easy to accidentally construct an inefficient scan by
>> omitting the first component of the key.  It's not surprising given how
>> kudu works, but it's not obvious just by looking at the API.
>>
>> I would like to contribute a patch, though I would like to learn more
>> about kudu before I feel comfortable.  In this case, for example, my naive
>> solution may work, but isn't very robust -- since it has to switch on the
>> data type of each cell, it will fail when new data types are added.
>>
>
> I don't think that's too bad - we have lots of places where we have to add
> a new case label when a new type is added, so what's one more? :)
>
>
>>   I have a hunch it's possible to copy row_data_ directly somehow, but I
>> have no idea how to correctly set isset_bitmap_ or owned_strings_bitmap_.
>>
>
> "isset_bitmap_" should probably be set based on the projected columns of
> the scanner.
> "owned_stirngs_bitmap_" indicates which strings in the row are "owned" by
> the PartialRow rather than "referenced". I think to make this API safe,
> we'd probably want to clone all of the string data, which would imply
> setting owned_strings_bitmap_ to 1 bits for those strings.
>
> Hope that helps
> -Todd
>
>
>>
>> On Wed, Feb 15, 2017 at 1:32 PM, Todd Lipcon <todd@cloudera.com> wrote:
>>
>>> Hi Paul,
>>>
>>> You're correct that there is no predicate-based delete currently
>>> available, so you have to scan and then feed the results of the scan back
>>> into your desired mutations/deletes. This is intentional, since right now
>>> we don't have multi-row transactional capabilities, and a "delete by
>>> predicate" API would probably give the false impression that it is
>>> transactional.
>>>
>>> I also think you're right that there isn't a nice way of propagating a
>>> RowResult into a PartialRow as you need to do here. It seems you're a C++
>>> programmer -- any interest in contributing a patch to make this
>>> transformation a bit easier?
>>>
>>> -Todd
>>>
>>>
>>>
>>> On Wed, Feb 15, 2017 at 10:27 AM, Paul Brannan <
>>> paul.brannan@thesystech.com> wrote:
>>>
>>>> I want to delete all rows that match a particular partial key.  For
>>>> example, if my schema includes columns "foo", "bar", and "baz" in its
>>>> primary key, I want to be able to delete all rows with "foo=16" and
>>>> "bar=32", regardless of the value of baz.  If I attempt to apply a
>>>> KuduDelete without specifying "baz", I get an error "Illegal state: Key not
>>>> specified".
>>>>
>>>> The best I have come up with so far is to do a scan and copy the data
>>>> cell-by-cell from the RowPtr returned by the scan into the KuduPartialRow
>>>> used by the delete; I don't see any good way in the interface to copy row
>>>> data from one to the other without copying cell-by-cell.  The code looks
>>>> something like:
>>>>
>>>>   for (auto idx : primary_key_column_indexes) {
>>>>     switch(schema.Column(idx).type()) {
>>>>     case KuduColumnSchema::INT16: // GetInt16/SetInt16
>>>>     case KuduColumnSchema::INT32: // GetInt32/SetInt32
>>>>     case KuduColumnSchema::STRING: // GetString/SetString
>>>>     // and so on...
>>>>     }
>>>>   }
>>>>
>>>> Is there a better way?
>>>>
>>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Mime
View raw message