drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From François Méthot <fmetho...@gmail.com>
Subject Wrong result in select with multiple identical UDF call
Date Thu, 14 Apr 2016 17:20:44 GMT
I was able to reproduce this on 1.5 running on a cluster
and on 1.6 in embedded mode.

Within a single select, if I select the same udf(value) multiple time,
different result may get outputted for each columns.

ex:
select name, ilike(name, 'jack'), ilike(name, 'jack'), ilike(name, 'jack'),
ilike(name, 'jack'), ilike(name, 'jack') from hdfs.`/data/` where
ilike(name, 'jack');

I get

jack | false | true | false
jack | true | true | true
jack | true | true | false
.....
most of them are jack | true | true | true

I observed this on parquet files as well as CSV file. I restart drill,
perform the query and it happens. Sometime it does not!



If I do
select count(1) from hdfs.`/data/` where ilike(name, 'jack') = true;
or
select count(1) from hdfs.`/data/` where ilike(name, 'jack') = true and
like(name, 'jack') = true and like(name, 'jack') = true and like(name,
'jack') = true;

The count is always the same, which is good. It looks like the select part
is crippled with some issue.

Francois
P.S. I ended up doing these weird tests because I was getting those same
inconsistent result from my own UDF, at some point I started testing the
built-in UDF in drill for my own sanity because I could see what could be
wrong with my code...

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message