drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julian Hyde (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3838) Ability to use UDFs in the directory pruning process
Date Fri, 25 Sep 2015 19:43:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908563#comment-14908563

Julian Hyde commented on DRILL-3838:

I like the idea of making directory scanning a relational operation. (Hive suffers, I think,
because their operations on the metastore are neither true queries nor metadata operations,
but somewhere in between, but the number of partitions can be truly huge, so would benefit
from being optimized and executed as if it were a query on a novel data source, namely the
metastore. Scanning the file system is analogous to scanning the metastore.)

Once directory scanning is a relational operation, the usual relational optimizations follow:
pushing down filters, and "sideways information passing" join optimizations like bloom filters.

So that would mean modeling a table scan either as (1) having a parameter, which is the name
of the current file, and set by a nested loop join above it which is fed by a directory scan,
or (2) giving the table scan an input, which is a stream of file names.

> Ability to use UDFs in the directory pruning process
> ----------------------------------------------------
>                 Key: DRILL-3838
>                 URL: https://issues.apache.org/jira/browse/DRILL-3838
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Query Planning & Optimization
>    Affects Versions: 1.2.0
>            Reporter: Stefán Baxter
> This feature request is about allowing UDFs to participate in the Directory/Partition
pruning process at runtime rather than at planing/optimization time.
> For this a UDF needs:
>  - filename
>  - full path (not just dirN)
>  - to be able to throw a IgnoreFile exception
>  - to be able to throw a IgnoreDirecotyr exception
> I think the naming is pretty self explanatory and hopefully this brief description is
> _Stefan 

This message was sent by Atlassian JIRA

View raw message