drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5152) Enhance the mock data source: better data, SQL access
Date Thu, 29 Dec 2016 04:39:58 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784490#comment-15784490

ASF GitHub Bot commented on DRILL-5152:

Github user cgivre commented on the issue:

    HI Paul, 
    Is the mock data source actually in Drill 1.9?  I tried executing this query and it threw
    > On Dec 27, 2016, at 21:54, Paul Rogers <notifications@github.com> wrote:
    > Provides an enhanced version of the mock data source. See the JIRA entry for motivation,
package-info.java for details of operation.
    > Allows tests to write queries of the form:
    > select id_i, name_s50 from `mock`.`employee_1K` ...
    > Where id_i is a field of random, uniformly distributed integers and name_s50 is a
VARCHAR column of width 50 of randomly generated strings. The _1K suffix says to generate
1000 rows. The names are just for convenience, the suffixes tell the mock data source what
to generate.
    > Examples of use will appear in a later commit that includes a revised test framework.
Existing tests that use the physical plan version of the mock data source work as before.
    > You can view, comment on, or merge this pull request online at:
    >   https://github.com/apache/drill/pull/708 <https://github.com/apache/drill/pull/708>
    > Commit Summary
    > DRILL-5152: Enhance the mock data source: better data, SQL access
    > File Changes
    > M exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractRecordReader.java
<https://github.com/apache/drill/pull/708/files#diff-0> (2)
    > M exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePlugin.java <https://github.com/apache/drill/pull/708/files#diff-1>
    > M exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistry.java
<https://github.com/apache/drill/pull/708/files#diff-2> (2)
    > M exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistryImpl.java
<https://github.com/apache/drill/pull/708/files#diff-3> (24)
    > A exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/ColumnDef.java <https://github.com/apache/drill/pull/708/files#diff-4>
    > A exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/DateGen.java <https://github.com/apache/drill/pull/708/files#diff-5>
    > A exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/DoubleGen.java <https://github.com/apache/drill/pull/708/files#diff-6>
    > A exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/ExtendedMockRecordReader.java
<https://github.com/apache/drill/pull/708/files#diff-7> (149)
    > A exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/FieldGen.java <https://github.com/apache/drill/pull/708/files#diff-8>
    > A exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/IntGen.java <https://github.com/apache/drill/pull/708/files#diff-9>
    > M exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/MockGroupScanPOP.java
<https://github.com/apache/drill/pull/708/files#diff-10> (127)
    > M exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/MockRecordReader.java
<https://github.com/apache/drill/pull/708/files#diff-11> (8)
    > M exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/MockScanBatchCreator.java
<https://github.com/apache/drill/pull/708/files#diff-12> (8)
    > M exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/MockStorageEngine.java
<https://github.com/apache/drill/pull/708/files#diff-13> (79)
    > M exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/MockStorageEngineConfig.java
<https://github.com/apache/drill/pull/708/files#diff-14> (9)
    > M exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/MockStorePOP.java
<https://github.com/apache/drill/pull/708/files#diff-15> (3)
    > M exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/MockSubScanPOP.java
<https://github.com/apache/drill/pull/708/files#diff-16> (20)
    > A exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/MoneyGen.java <https://github.com/apache/drill/pull/708/files#diff-17>
    > A exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/StringGen.java <https://github.com/apache/drill/pull/708/files#diff-18>
    > A exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/package-info.java
<https://github.com/apache/drill/pull/708/files#diff-19> (130)
    > M exec/java-exec/src/test/java/org/apache/drill/exec/TestOpSerialization.java <https://github.com/apache/drill/pull/708/files#diff-20>
    > M exec/java-exec/src/test/java/org/apache/drill/exec/fn/interp/ExpressionInterpreterTest.java
<https://github.com/apache/drill/pull/708/files#diff-21> (4)
    > Patch Links:
    > https://github.com/apache/drill/pull/708.patch <https://github.com/apache/drill/pull/708.patch>
    > https://github.com/apache/drill/pull/708.diff <https://github.com/apache/drill/pull/708.diff>
    > —
    > You are receiving this because you are subscribed to this thread.
    > Reply to this email directly, view it on GitHub <https://github.com/apache/drill/pull/708>,
or mute the thread <https://github.com/notifications/unsubscribe-auth/AFQfviitiJqMVi2vYWaYP7mAceYPFv7Gks5rMc91gaJpZM4LWo0d>.

> Enhance the mock data source: better data, SQL access
> -----------------------------------------------------
>                 Key: DRILL-5152
>                 URL: https://issues.apache.org/jira/browse/DRILL-5152
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Tools, Build & Test
>    Affects Versions: 1.9.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
> Drill provides a mock data storage engine that generates random data. The mock engine
is used in some older unit tests that need a volume of data, but that are not too particular
about the details of the data.
> The mock data source continues to have use even for modern tests. For example, the work
in the external storage batch requires tests with varying amounts of data, but the exact form
of the data is not important, just the quantity. For example, if we want to ensure that spilling
happens at various trigger points, we need to read the right amount of data for that trigger.
> The existing mock data source has two limitations:
> 1. It generates only "black/white" (alternating) values, which is awkward for use in
> 2. The mock generator is accessible only from a physical plan, but not from SQL queries.
> This enhancement proposes to fix both limitations:
> 1. Generate a uniform, randomly distributed set of values.
> 2. Provide an encoding that lets a SQL query specify the data to be generated.
> Example SQL query:
> {code}
> SELECT id_i, name_s50 FROM `mock`.employee_10K;
> {code}
> The above says to generate two fields: INTEGER (the "_i" suffix) and VARCHAR(50) (the
"_s50") suffix; and to generate 10,000 rows (the "_10K" suffix on the table.)

This message was sent by Atlassian JIRA

View raw message