drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (DRILL-5204) Extend mock data source to use table specs from SQL
Date Tue, 16 May 2017 19:37:04 GMT

     [ https://issues.apache.org/jira/browse/DRILL-5204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Paul Rogers resolved DRILL-5204.
--------------------------------
    Resolution: Fixed

Not sure why this was not closed earlier. Feature has been checked into Master.

Set up the mock data source. Then:

{code}
SELECT id_i, name_s50 FROM `mock`.`customers_1M`
{code}

The column and table names are fictions. The important part is the suffix. For columns, "_i"
means integer, "_sx" means a string of length x, and so on. For tables, "x" means x rows.
"xK" means x thousand rows. "xM" means x million rows.

See the {{ExampleTest}} class for details.

> Extend mock data source to use table specs from SQL
> ---------------------------------------------------
>
>                 Key: DRILL-5204
>                 URL: https://issues.apache.org/jira/browse/DRILL-5204
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Tools, Build & Test
>    Affects Versions: 1.9.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>
> DRILL-5152 provided a simple way to generate mock data from SQL:
> {code}
> SELECT colName_type FROM `mock`.`tableName_size` ...
> {code}
> The fix in that release encoded types and record counts directly in the SQL, which is
very handy for many simple cases.
> The original mock data source has another feature: it lets you create multiple mock blocks
of data that can be read in multiple threads. Later additions made it easy to repeat a column
definition (to generate, say, a table with 1000 columns), to choose the data generator class,
etc. All of this was available only when writing physical plans by hand and encoding the definition
in the sub scan for the mock data source.
> This enhancement extends the SQL feature to allow the definitions to appear in a JSON
file easily referenced from SQL. The JSON file must be somewhere on the class path (typically
in a resources directory.) Then:
> {code}
> SELECT red, blue, green FROM `mock`.`foo/colors.json` ...
> {code}
> Is interpreted to mean, "the file colors.json defines a mock data source, perhaps with
repeated columns, perhaps with multiple fragments. From that mock data source, select the
three columns red, blue and green."
> With this change, tests can include quite sophisticated mock data sources, simplifying
debugging of plans with multiple fragments and/or more complex table structures.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message