drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@dremio.com>
Subject Re: select from table with options
Date Wed, 21 Oct 2015 00:35:56 GMT
Some more options:

** (1) add options to WITH clause (more declarative) **
  WITH MyTable
  AS
    (select * mydb.mytable)
    USING OPTIONS  (type = 'text' AND lineDelimiter = '\n')

** (2) add options to WITH (json literal based) **
  WITH MyTable
  AS
  (select * mydb.mytable)
  USING OPTIONS {  type: "text", linedDelimiter = "\n"}

** (3) Enhance the Calcite EXTEND clause to support an OPTIONS clause **
  SELECT FROM emp EXTEND [optional columns list] OPTIONS (type 'text',
lineDelimiter);

** (4) Solve with specific syntax for the most common scenarios (very
declarative) **
  select * FROM
  mydb.mytable
    TREAT AS TEXT
      USING
        LINE DELIMITER '\n'
        AND FIELD DELIMITER ','
        AND TREAT FIRST ROW AS HEADERS


I'm actually most inclined to the fourth. It seems like the most user
friendly. From a grammar perspective, I think you need to figure out a way
to use TREAT AS as a grammar switch so we can avoid protecting these
expressions in all contexts. What is nice about this pattern is that is
understandable by non-technical users and fits sql. Hiding things in a
table function makes things more complex.

I think we should think about this in the context of ALTER TABLE ASCRIBE
METADATA as well.

For example:
ALTER TABLE mydb.mytable
  ASCRIBE METADATA
    TREAT AS TEXT
      USING
        LINE DELIMITER '\n'
        AND FIELD DELIMITER ','
        AND TREAT FIRST ROW AS HEADERS

If we use table functions, I'm not sure how we make the ascribe metadata
operation have the same syntax.


--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Tue, Oct 20, 2015 at 11:32 AM, Julian Hyde <jhyde@apache.org> wrote:

> +1 to use table functions
>
> In Calcite (and I presume Drill) a “table function” may actually function
> more like a (Lisp) macro. The function gets called at prepare time to yield
> a RelNode (say a TableScan). So a table function is every bit as efficient
> as using a table, but it allows extra parameters.
>
> If the table function has a lot of parameters it might be nice to support
> named parameters:
>
> select * from table(disitributedFile(path => ‘/path/to/something.psv’,
> delimiter => ‘|’));
>
> Named parameters are in the SQL standard but are not supported by
> Calcite’s parser currently. Parameters can be specified in any order, and
> those not specified have a default value.
>
> Julian
>
>
> > On Oct 19, 2015, at 5:18 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> >
> > Wouldn't a table function be a better option?
> >
> > Something like this perhaps?
> >
> > select * from
> > delimitedFile(dfs.`default`.`/path/to/file/something.psv`, '|')
> >
> > ?
> >
> > Or how about fake-o parameters that the delimited record scanner knows
> how
> > to push down into the scanning of the data? That would look like this:
> >
> > select * from
> > dfs.`default`.`/path/to/file/something.psv`
> > where magicFieldDelimiter = '|';
> >
> >
> >
> > On Mon, Oct 19, 2015 at 2:28 PM, Julien Le Dem <julien@dremio.com>
> wrote:
> >
> >> I'm looking into passing information on how to interpret a file through
> the
> >> select clause in Drill.
> >> Something along the lines of:
> >> *select * from
> >> dfs.`default`.`/path/to/file/something.psv?type=text&delimiter=|`;*
> >> (In this example, we want to specify a specific delimiter, but that
> would
> >> apply to any *type* of format)
> >>
> >> Which would allow to read a file without having to centrally configure
> >> formats: https://drill.apache.org/docs/querying-plain-text-files/
> >> Which makes it easier to try to read an existing file.
> >> Typically once the user has found the proper settings, they would update
> >> the central configuration.
> >>
> >> thoughts?
> >>
> >> --
> >> Julien
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message