drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jinfeng Ni <...@apache.org>
Subject Re: [DRILL HANGOUT] Topics for 5/16/2017
Date Tue, 16 May 2017 17:01:31 GMT
We will start hangout shortly.

https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc


On Mon, May 15, 2017 at 9:53 PM, Jinfeng Ni <jni@apache.org> wrote:
> My feeling is that either temp table or putting 100k values into a
> separate parquet files makes more sense than putting 100k values in a
> IN list.  Although for such long IN list Drill planner will convert
> into a JOIN (which is same as temp table / parquet table solutions),
> there is a big difference in terms of what the query plan looks like.
> An IN list with 100k values has to be serialized / de-serialized
> before the plan can be executed. I guess that would create a huge
> serialized plan, and is not the best solution one may use.
>
> Also, putting 100k values in IN list may not be very typical. RDBMS
> probably impose certain limits on # of values in IN list. For
> instance, Oracle set the limit to 1000 [1].
>
> 1. http://docs.oracle.com/database/122/SQLRF/Expression-Lists.htm#SQLRF52099
>
> On Mon, May 15, 2017 at 7:11 PM,  <jasbir.sing@accenture.com> wrote:
>> Hi,
>>
>> I am stuck in a problem where instance of apache drill stops working. My topic of
discussion will be -
>>
>> For a scenario, I have 25 parquet file with around 400K-500K records with around
10 columns. My select query is such that for one column in clause values are around 100K.
When I run these queries parallelly, instance of apache drill hangs and then gets shut down.
Therefore, how to design the select queries that apache supports these queries.
>> One of the solution that we are trying is -
>> a- Create temp table of 100K values and then use this as an inner query. But as I
know we can't make temp table at run time from Java code. It needs some data source either
parquet or some other source to create temp table.
>> b - Create a separate parquet file of all 100K values and use inner query instead
of all the values directly in the main query.
>>
>> Is there any better way to go around this problem or can we just solve this problem
with simple configuration changes ?
>>
>> Regards,
>> Jasbir Singh
>>
>>
>> -----Original Message-----
>> From: Jinfeng Ni [mailto:jni@apache.org]
>> Sent: Tuesday, May 16, 2017 2:29 AM
>> To: dev <dev@drill.apache.org>; user <user@drill.apache.org>
>> Subject: [DRILL HANGOUT] Topics for 5/16/2017
>>
>> Hi All,
>>
>> Out bi-weekly Drill hangout is tomorrow (5/16/2017, 10AM PDT). Please respond with
suggestion of topics for discussion. We will also collect topics at the beginning of handout
tomorrow.
>>
>> Thanks,
>>
>> Jinfeng
>>
>> ________________________________
>>
>> This message is for the designated recipient only and may contain privileged, proprietary,
or otherwise confidential information. If you have received it in error, please notify the
sender immediately and delete the original. Any other use of the e-mail by you is prohibited.
Where allowed by local law, electronic communications with Accenture and its affiliates, including
e-mail and instant messaging (including content), may be scanned by our systems for the purposes
of information security and assessment of internal compliance with Accenture policy.
>> ______________________________________________________________________________________
>>
>> www.accenture.com

Mime
View raw message