drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tomer Shiran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3474) Filename should be an available column when querying a directory
Date Sat, 26 Sep 2015 17:10:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14909350#comment-14909350

Tomer Shiran commented on DRILL-3474:

I think we should have virtual fields that are not returned, by default, as part of a SELECT

I also think that instead of dir0, dir1, etc. we should have path as a [virtual] map which
has properties like:

path.parts[0], path.parts[1], etc.

In other words, path can be treated like a map:

  "parts": ["foo", "bar", "baz.csv"],
  "name": "baz.csv",
  "suffix": "csv"

(I borrowed the property names from https://docs.python.org/3/library/pathlib.html)

This approach provides the flexibility that people are looking for and adheres to the JSON-oriented
nature of Drill. In addition to partition pruning users will be able to query only the CSV
files in a file system subtree (WHERE path.suffix = 'csv').

P.S. Maybe we also need to distinguish between relative and absolute paths in our case.

> Filename should be an available column when querying a directory
> ----------------------------------------------------------------
>                 Key: DRILL-3474
>                 URL: https://issues.apache.org/jira/browse/DRILL-3474
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>    Affects Versions: 1.1.0
>            Reporter: Jim Scott
>            Assignee: Jacques Nadeau
> I could not find another ticket which talks about this ...
> The file name should be a column which can be selected or filtered when querying a directory
just like dir0, dir1 are available.

This message was sent by Atlassian JIRA

View raw message