Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 58E8E200C03 for ; Sat, 7 Jan 2017 05:33:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 578C2160B51; Sat, 7 Jan 2017 04:33:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A87A9160B48 for ; Sat, 7 Jan 2017 05:32:59 +0100 (CET) Received: (qmail 90919 invoked by uid 500); 7 Jan 2017 04:32:58 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 90828 invoked by uid 99); 7 Jan 2017 04:32:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Jan 2017 04:32:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 8ECE12C2A68 for ; Sat, 7 Jan 2017 04:32:58 +0000 (UTC) Date: Sat, 7 Jan 2017 04:32:58 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-5152) Enhance the mock data source: better data, SQL access MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sat, 07 Jan 2017 04:33:00 -0000 [ https://issues.apache.org/jira/browse/DRILL-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15806752#comment-15806752 ] ASF GitHub Bot commented on DRILL-5152: --------------------------------------- Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/708#discussion_r95047914 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/MockGroupScanPOP.java --- @@ -104,44 +102,46 @@ public String toString() { } @JsonInclude(Include.NON_NULL) - public static class MockColumn{ + public static class MockColumn { @JsonProperty("type") public MinorType minorType; public String name; public DataMode mode; public Integer width; public Integer precision; public Integer scale; - + public String generator; + public Integer repeat; @JsonCreator - public MockColumn(@JsonProperty("name") String name, @JsonProperty("type") MinorType minorType, @JsonProperty("mode") DataMode mode, @JsonProperty("width") Integer width, @JsonProperty("precision") Integer precision, @JsonProperty("scale") Integer scale) { --- End diff -- Same here - for overloading constructor. > Enhance the mock data source: better data, SQL access > ----------------------------------------------------- > > Key: DRILL-5152 > URL: https://issues.apache.org/jira/browse/DRILL-5152 > Project: Apache Drill > Issue Type: Improvement > Components: Tools, Build & Test > Affects Versions: 1.9.0 > Reporter: Paul Rogers > Assignee: Paul Rogers > Priority: Minor > > Drill provides a mock data storage engine that generates random data. The mock engine is used in some older unit tests that need a volume of data, but that are not too particular about the details of the data. > The mock data source continues to have use even for modern tests. For example, the work in the external storage batch requires tests with varying amounts of data, but the exact form of the data is not important, just the quantity. For example, if we want to ensure that spilling happens at various trigger points, we need to read the right amount of data for that trigger. > The existing mock data source has two limitations: > 1. It generates only "black/white" (alternating) values, which is awkward for use in sorting. > 2. The mock generator is accessible only from a physical plan, but not from SQL queries. > This enhancement proposes to fix both limitations: > 1. Generate a uniform, randomly distributed set of values. > 2. Provide an encoding that lets a SQL query specify the data to be generated. > Example SQL query: > {code} > SELECT id_i, name_s50 FROM `mock`.employee_10K; > {code} > The above says to generate two fields: INTEGER (the "_i" suffix) and VARCHAR(50) (the "_s50") suffix; and to generate 10,000 rows (the "_10K" suffix on the table.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)