kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adar Lieber-Dembo <a...@cloudera.com>
Subject Re: Kudu silently failing to set nullable column
Date Mon, 09 Apr 2018 17:59:03 GMT
This sounds like an Impala bug to me.
http://issues.cloudera.org/browse/IMPALA-5217 seems relevant.

Since you're using Cloudera's distribution of Impala, I took a look at
Cloudera's release notes, and that bug was fixed in CDH 5.12.0,
5.11.2, and 5.10.2. So it makes sense that your Impala would be
affected  (CDH 5.11.1).

If upgrading to an unaffected release doesn't help, maybe try asking
on an Apache Impala mailing list? See
https://impala.apache.org/community.html for more information.


On Thu, Apr 5, 2018 at 5:30 PM, Jeff Jones <jjones@adaptivebiotech.com> wrote:
> Environment:
>
> impalad version 2.8.0-cdh5.11.1 RELEASE (build
> 3382c1c488dff12d5ca8d049d2b59babee605b4e)
>
> Built on Thu Jun  1 10:36:12 PDT 2017
>
>
>
> kudu 1.1.0
>
> revision 3f64b68724ded320a31b69467c5a5f7744b07a50
>
> build type RELEASE
>
> built by jenkins at 17 Nov 2016 00:34:09 PST on
> impala-ec2-pkg-centos-7-0fe9.vpc.cloudera.com
>
> build id 2016-11-17_00-00-22
>
>
>
> I have a Kudu table that I’m trying to update through JDBC which fails
> silently when trying to update a nullable column to null. When running the
> update through the impala-shell it gives me a warning about violating a
> nullability constraint. The table description is below. The column in
> question is the sample_catalog_tags column at the bottom.
>
>
>
> Query: describe samples_v3_1_dj
>
> +--------------------------------------------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
>
> | name                                       | type   | comment |
> primary_key | nullable | default_value | encoding      | compression
> | block_size |
>
> +--------------------------------------------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
>
> | sequence_result_id                         | string |         | true
> | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | sample_id                                  | string |         | false
> | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | sample_name                                | string |         | false
> | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | species                                    | string |         | false
> | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | locus                                      | string |         | false
> | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | product_subtype                            | string |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | counting_method                            | string |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | primer_set                                 | string |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | productive_entropy                         | double |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | productive_clonality                       | double |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | sample_entropy                             | double |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | sample_clonality                           | double |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | sample_amount_ng                           | double |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | sample_cells_mass_estimate                 | int    |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | fraction_productive_of_cells_mass_estimate | double |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | sample_cells                               | int    |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | fraction_productive_of_cells               | double |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | max_productive_frequency                   | double |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | max_frequency                              | double |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | total_templates                            | int    |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | productive_templates                       | int    |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | outofframe_templates                       | int    |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | stop_templates                             | int    |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | dj_templates                               | int    |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | total_rearrangements                       | int    |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | productive_rearrangements                  | int    |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | outofframe_rearrangements                  | int    |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | stop_rearrangements                        | int    |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | dj_rearrangements                          | int    |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | total_reads                                | int    |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | total_productive_reads                     | int    |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | total_outofframe_reads                     | int    |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | total_stop_reads                           | int    |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | total_dj_reads                             | int    |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | sequence_result_status                     | string |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | release_date                               | bigint |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | upload_date                                | bigint |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | fraction_productive                        | double |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | order_name                                 | string |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | kit_id                                     | string |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | kit_pool                                   | string |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | owning_workspace_id                        | string |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | min_rearrangement_length                   | int    |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | est_productive_receptors                   | double |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | est_total_receptors                        | double |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | sample_properties                          | string |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | total_templates_agg                        | double |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | total_t_cells                              | double |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | total_t_cells_new                          | double |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> | sample_catalog_tags                        | string |         | false
> | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0
> |
>
> +--------------------------------------------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
>
> Fetched 50 row(s) in 0.02s
>
>
>
> When I run the update statement in the impala-shell I see the warning about
> nullability and a subsequent select shows the column was not updated.
>
>
>
>> select sample_id, sequence_result_id, sample_catalog_tags from
>> samples_v3_1_dj WHERE sample_id IN ('1e6a74f5-8a2e-4806-b7bf-4cf1dd947899');
>
> Query: select sample_id, sequence_result_id, sample_catalog_tags from
> samples_v3_1_dj WHERE sample_id IN ('1e6a74f5-8a2e-4806-b7bf-4cf1dd947899')
>
> Query submitted at: 2018-04-06 00:12:21 (Coordinator: http://...:25000)
>
> Query progress can be monitored at:
> http://...:25000/query_plan?query_id=d344cbbba783c2ce:b3be430e00000000
>
> +--------------------------------------+--------------------------------------+-----------------------------------------+
>
> | sample_id                            | sequence_result_id
> | sample_catalog_tags                     |
>
> +--------------------------------------+--------------------------------------+-----------------------------------------+
>
> | 1e6a74f5-8a2e-4806-b7bf-4cf1dd947899 |
> 280df143-b893-4e2b-b1e7-dfb9940aaf10 | Tissue
> Source:Blood,Diagnosis:Diagnosis |
>
> | 1e6a74f5-8a2e-4806-b7bf-4cf1dd947899 |
> d10517c0-0aeb-4d73-96a6-db8ab9ca60d7 | Tissue
> Source:Blood,Diagnosis:Diagnosis |
>
> | 1e6a74f5-8a2e-4806-b7bf-4cf1dd947899 |
> 19154086-868e-484f-9c7d-4d62434ca571 | Tissue
> Source:Blood,Diagnosis:Diagnosis |
>
> | 1e6a74f5-8a2e-4806-b7bf-4cf1dd947899 |
> fbdc1f5e-5aa8-40e2-bc64-b97af03b4c66 | Tissue
> Source:Blood,Diagnosis:Diagnosis |
>
> +--------------------------------------+--------------------------------------+-----------------------------------------+
>
> Fetched 4 row(s) in 0.23s
>
>> UPDATE samples_v3_1_dj SET sample_catalog_tags=null WHERE
>> sequence_result_id IN ('280df143-b893-4e2b-b1e7-dfb9940aaf10');
>
> Query: update samples_v3_1_dj SET sample_catalog_tags=null WHERE
> sequence_result_id IN ('280df143-b893-4e2b-b1e7-dfb9940aaf10')
>
> Query submitted at: 2018-04-06 00:13:13 (Coordinator: http://...:25000)
>
> Query progress can be monitored at:
> http://...:25000/query_plan?query_id=f8427448b43445db:aa28650800000000
>
> WARNINGS: Row with null value violates nullability constraint on table
> 'impala::immunoseq_analyzer.samples_v3_1_dj'.
>
>
>
> Modified 0 row(s), 1 row error(s) in 0.34s
>
>> select sample_id, sequence_result_id, sample_catalog_tags from
>> samples_v3_1_dj WHERE sample_id IN ('1e6a74f5-8a2e-4806-b7bf-4cf1dd947899');
>
> Query: select sample_id, sequence_result_id, sample_catalog_tags from
> samples_v3_1_dj WHERE sample_id IN ('1e6a74f5-8a2e-4806-b7bf-4cf1dd947899')
>
> Query submitted at: 2018-04-06 00:13:18 (Coordinator: http://...:25000)
>
> Query progress can be monitored at:
> http://...:25000/query_plan?query_id=e44ab2131f30dc65:eef1e66400000000
>
> +--------------------------------------+--------------------------------------+-----------------------------------------+
>
> | sample_id                            | sequence_result_id
> | sample_catalog_tags                     |
>
> +--------------------------------------+--------------------------------------+-----------------------------------------+
>
> | 1e6a74f5-8a2e-4806-b7bf-4cf1dd947899 |
> 280df143-b893-4e2b-b1e7-dfb9940aaf10 | Tissue
> Source:Blood,Diagnosis:Diagnosis |
>
> | 1e6a74f5-8a2e-4806-b7bf-4cf1dd947899 |
> fbdc1f5e-5aa8-40e2-bc64-b97af03b4c66 | Tissue
> Source:Blood,Diagnosis:Diagnosis |
>
> | 1e6a74f5-8a2e-4806-b7bf-4cf1dd947899 |
> d10517c0-0aeb-4d73-96a6-db8ab9ca60d7 | Tissue
> Source:Blood,Diagnosis:Diagnosis |
>
> | 1e6a74f5-8a2e-4806-b7bf-4cf1dd947899 |
> 19154086-868e-484f-9c7d-4d62434ca571 | Tissue
> Source:Blood,Diagnosis:Diagnosis |
>
> +--------------------------------------+--------------------------------------+-----------------------------------------+
>
> Fetched 4 row(s) in 0.24s
>
>
>
>
>
> As you can see above the column is nullable but attempting to set the column
> to null does not work. Below shows that there are other rows with null
> values for this column.
>
>
>
>> select sample_id, sequence_result_id, sample_catalog_tags from
>> samples_v3_1_dj where sample_catalog_tags is null limit 1;
>
> Query: select sample_id, sequence_result_id, sample_catalog_tags from
> samples_v3_1_dj where sample_catalog_tags is null limit 1
>
> Query submitted at: 2018-04-06 00:20:01 (Coordinator: http://...:25000)
>
> Query progress can be monitored at:
> http://...:25000/query_plan?query_id=604afe3b4fcb9c0c:364161d100000000
>
> +--------------------------------------+--------------------------------------+---------------------+
>
> | sample_id                            | sequence_result_id
> | sample_catalog_tags |
>
> +--------------------------------------+--------------------------------------+---------------------+
>
> | b76eaccd-da3f-49de-b04a-28ad3cb112d5 |
> 0008df4c-01f0-4e7f-9493-954a1e0530bd | NULL                |
>
> +--------------------------------------+--------------------------------------+---------------------+
>
> Fetched 1 row(s) in 0.33s
>
>
>
> Now if I attempt an upsert you can see that it does in fact work.
>
>
>
>> upsert into table samples_v3_1_dj (sequence_result_id, sample_id,
>> sample_name, species, locus, product_subtype, sample_catalog_tags) values
>> ('280df143-b893-4e2b-b1e7-dfb9940aaf10',
>> '1e6a74f5-8a2e-4806-b7bf-4cf1dd947899', 'SAMPLE-4', 'Human', 'TCRB',
>> 'Survey', null);
>
> Query: upsert into table samples_v3_1_dj (sequence_result_id, sample_id,
> sample_name, species, locus, product_subtype, sample_catalog_tags) values
> ('280df143-b893-4e2b-b1e7-dfb9940aaf10',
> '1e6a74f5-8a2e-4806-b7bf-4cf1dd947899', 'SAMPLE-4', 'Human', 'TCRB',
> 'Survey', null)
>
> Query submitted at: 2018-04-06 00:21:44 (Coordinator: http://...:25000)
>
> Query progress can be monitored at:
> http://...:25000/query_plan?query_id=b942ea9871581947:b363f6d000000000
>
> Modified 1 row(s), 0 row error(s) in 0.11s
>
>> select sample_id, sequence_result_id, sample_catalog_tags from
>> samples_v3_1_dj WHERE sample_id IN ('1e6a74f5-8a2e-4806-b7bf-4cf1dd947899');
>
> Query: select sample_id, sequence_result_id, sample_catalog_tags from
> samples_v3_1_dj WHERE sample_id IN ('1e6a74f5-8a2e-4806-b7bf-4cf1dd947899')
>
> Query submitted at: 2018-04-06 00:21:51 (Coordinator: http://...:25000)
>
> Query progress can be monitored at:
> http://...:25000/query_plan?query_id=f3485ee6580ae44f:b8c5242100000000
>
> +--------------------------------------+--------------------------------------+-----------------------------------------+
>
> | sample_id                            | sequence_result_id
> | sample_catalog_tags                     |
>
> +--------------------------------------+--------------------------------------+-----------------------------------------+
>
> | 1e6a74f5-8a2e-4806-b7bf-4cf1dd947899 |
> 280df143-b893-4e2b-b1e7-dfb9940aaf10 | NULL
> |
>
> | 1e6a74f5-8a2e-4806-b7bf-4cf1dd947899 |
> d10517c0-0aeb-4d73-96a6-db8ab9ca60d7 | Tissue
> Source:Blood,Diagnosis:Diagnosis |
>
> | 1e6a74f5-8a2e-4806-b7bf-4cf1dd947899 |
> 19154086-868e-484f-9c7d-4d62434ca571 | Tissue
> Source:Blood,Diagnosis:Diagnosis |
>
> | 1e6a74f5-8a2e-4806-b7bf-4cf1dd947899 |
> fbdc1f5e-5aa8-40e2-bc64-b97af03b4c66 | Tissue
> Source:Blood,Diagnosis:Diagnosis |
>
> +--------------------------------------+--------------------------------------+-----------------------------------------+
>
> Fetched 4 row(s) in 0.33s
>
>
>
> I suppose I could change my update to an upsert…select…from but it’s still
> bugging me that the update isn’t working.
>
>
>
>> upsert into table samples_v3_1_dj (sequence_result_id, sample_id,
>> sample_name, species, locus, product_subtype, sample_catalog_tags) select
>> sequence_result_id, sample_id, sample_name, species, locus, product_subtype,
>> null from samples_v3_1_dj where sample_id =
>> '1e6a74f5-8a2e-4806-b7bf-4cf1dd947899';
>
> Query: upsert into table samples_v3_1_dj (sequence_result_id, sample_id,
> sample_name, species, locus, product_subtype, sample_catalog_tags) select
> sequence_result_id, sample_id, sample_name, species, locus, product_subtype,
> null from samples_v3_1_dj where sample_id =
> '1e6a74f5-8a2e-4806-b7bf-4cf1dd947899'
>
> Query submitted at: 2018-04-06 00:28:28 (Coordinator: http://...:25000)
>
> Query progress can be monitored at:
> http://...:25000/query_plan?query_id=8d422b3c7ff44356:69fa874300000000
>
> Modified 4 row(s), 0 row error(s) in 0.44s
>
>> select sample_id, sequence_result_id, sample_catalog_tags from
>> samples_v3_1_dj WHERE sample_id IN ('1e6a74f5-8a2e-4806-b7bf-4cf1dd947899');
>
> Query: select sample_id, sequence_result_id, sample_catalog_tags from
> samples_v3_1_dj WHERE sample_id IN ('1e6a74f5-8a2e-4806-b7bf-4cf1dd947899')
>
> Query submitted at: 2018-04-06 00:28:34 (Coordinator: http://...:25000)
>
> Query progress can be monitored at:
> http://...:25000/query_plan?query_id=447bcd28973cd84:b76c1b2b00000000
>
> +--------------------------------------+--------------------------------------+---------------------+
>
> | sample_id                            | sequence_result_id
> | sample_catalog_tags |
>
> +--------------------------------------+--------------------------------------+---------------------+
>
> | 1e6a74f5-8a2e-4806-b7bf-4cf1dd947899 |
> fbdc1f5e-5aa8-40e2-bc64-b97af03b4c66 | NULL                |
>
> | 1e6a74f5-8a2e-4806-b7bf-4cf1dd947899 |
> d10517c0-0aeb-4d73-96a6-db8ab9ca60d7 | NULL                |
>
> | 1e6a74f5-8a2e-4806-b7bf-4cf1dd947899 |
> 19154086-868e-484f-9c7d-4d62434ca571 | NULL                |
>
> | 1e6a74f5-8a2e-4806-b7bf-4cf1dd947899 |
> 280df143-b893-4e2b-b1e7-dfb9940aaf10 | NULL                |
>
> +--------------------------------------+--------------------------------------+---------------------+
>
> Fetched 4 row(s) in 0.17s
>
>
>
> Any thoughts on how I could get the update to work?
>
>
>
> Thanks,
> Jeff
>
>
>
> This message (and any attachments) is intended only for the designated
> recipient(s). It
> may contain confidential or proprietary information, or have other
> limitations on use as
> indicated by the sender. If you are not a designated recipient, you may not
> review, use,
> copy or distribute this message. If you received this in error, please
> notify the sender by
> reply e-mail and delete this message.

Mime
View raw message