drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <prog...@mapr.com>
Subject Re: Mock Data Source Question
Date Fri, 31 Mar 2017 20:40:22 GMT
Hi Charles,

The mock reader has existing in Drill for some time (thanks to the original authors!) and
we recently extended it a bit. There are now three ways to use it.

* Via a physical plan (the original method). See [1]
* Via a specially-coded SQL query (as Boaz explained).
* Via a SQL query that references a JSON file. See [2]

The first step for either of the SQL queries is to configure the mock storage plugin. I always
do this from code, so I’m not exactly sure of the steps from the web UI. But, basically,
you need to create a plugin definition called “mock” (can really be anything) that is
an instance of the “mock” storage plugin type. No configuration parameters are needed.

Then, for the SQL, use the steps that Boaz explained. This gives randomly-distributed mock
data for a few supported types (int, double, boolean and float.)

In the simple-SQL form, the mock acts as a single row group and your query will have only
one slice. Using the JSON definition, you can create multiple row groups, more complex schemas
and have custom control over the data generator. For that, see [3].

All of this can use better explanations. Ask questions where we have gaps and I’ll go ahead
and fill in any needed information.

Thanks,

- Paul

[1] https://github.com/paul-rogers/drill/wiki/Testing-with-Physical-Plans-and-Mock-Data
[2] https://github.com/paul-rogers/drill/wiki/The-Mock-Record-Reader
[3] https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/package-info.java

> On Mar 31, 2017, at 12:20 PM, Boaz Ben-Zvi <bben-zvi@mapr.com> wrote:
> 
> Hi Charles,
> 
>     Below is an example for using the mock storage; I use this now for testing my new
code ( Hash Aggregation spilling ; so this specific test will not work for you now …).
> The query below  -  "SELECT empid_s17, dept_i, branch_i, AVG(salary_i) FROM `mock`.`employee_1200K`
GROUP BY empid_s17, dept_i, branch_i";
> shows that you just make up the names for the table and the columns, followed by the
size (for the table) and the column type ( “i” for integer,  “d” for float, “s<size>”
for a varchar + size).
> Not sure if all the used imports are in 1.10 ; else you’d need the latest code.
> 
>     Boaz
> 
> 
> package org.apache.drill.exec.physical.impl.agg;
> 
> import ch.qos.logback.classic.Level;
> import org.apache.drill.BaseTestQuery;
> import org.apache.drill.exec.ExecConstants;
> import org.apache.drill.exec.physical.impl.aggregate.HashAggTemplate;
> import org.apache.drill.exec.planner.physical.PlannerSettings;
> import org.apache.drill.exec.proto.UserBitShared;
> import org.apache.drill.test.ClientFixture;
> import org.apache.drill.test.ClusterFixture;
> import org.apache.drill.test.FixtureBuilder;
> import org.apache.drill.test.LogFixture;
> import org.apache.drill.test.ProfileParser;
> import org.apache.drill.test.QueryBuilder;
> import org.junit.Ignore;
> import org.junit.Test;
> 
> import java.util.List;
> 
> import static org.junit.Assert.assertEquals;
> import static org.junit.Assert.assertTrue;
> 
> /**
> *  Test spilling for the Hash Aggr operator (using the mock reader)
> */
> public class TestHashAggrSpill extends BaseTestQuery {
> 
>    private void runAndDump(ClientFixture client, String sql, long expectedRows, long
spillCycle, long spilledPartitions) throws Exception {
>        String plan = client.queryBuilder().sql(sql).explainJson();
> 
>        QueryBuilder.QuerySummary summary = client.queryBuilder().sql(sql).run();
>        if ( expectedRows > 0 ) {
>            assertEquals(expectedRows, summary.recordCount());
>        }
>        System.out.println(String.format("======== \n Results: %,d records, %d batches,
%,d ms\n ========", summary.recordCount(), summary.batchCount(), summary.runTimeMs() ) );
> 
>        System.out.println("Query ID: " + summary.queryIdString());
>        ProfileParser profile = client.parseProfile(summary.queryIdString());
>        profile.print();
>        List<ProfileParser.OperatorProfile> ops = profile.getOpsOfType(UserBitShared.CoreOperatorType.HASH_AGGREGATE_VALUE);
> 
>        assertTrue( ! ops.isEmpty() );
>        // check for the first op only
>        ProfileParser.OperatorProfile hag = ops.get(0);
>        long opCycle = hag.getMetric(HashAggTemplate.Metric.SPILL_CYCLE.ordinal());
>        assertEquals(spillCycle, opCycle);
>        long op_spilled_partitions = hag.getMetric(HashAggTemplate.Metric.SPILLED_PARTITIONS.ordinal());
>        assertEquals(spilledPartitions, op_spilled_partitions);
>    }
> 
>    /**
>     * Test "normal" spilling: Only 2 partitions (out of 4) would require spilling
>     * ("normal spill" means spill-cycle = 1 )
>     *
>     * @throws Exception
>     */
>    @Test
>    public void testHashAggrSpill() throws Exception {
>        LogFixture.LogFixtureBuilder logBuilder = LogFixture.builder()
>            .toConsole()
>            .logger("org.apache.drill.exec.physical.impl.aggregate", Level.WARN)
>            ;
> 
>        FixtureBuilder builder = ClusterFixture.builder()
>            .configProperty(ExecConstants.HASHAGG_MAX_MEMORY_KEY,"46000kB")
>            .configProperty(ExecConstants.HASHAGG_NUM_PARTITIONS_KEY,16)
>            // .sessionOption(PlannerSettings.EXCHANGE.getOptionName(), true)
>            .maxParallelization(2)
>            .saveProfiles()
>            //.keepLocalFiles()
>            ;
>        try (LogFixture logs = logBuilder.build();
>             ClusterFixture cluster = builder.build();
>             ClientFixture client = cluster.clientFixture()) {
>            String sql = "SELECT empid_s17, dept_i, branch_i, AVG(salary_i) FROM `mock`.`employee_1200K`
GROUP BY empid_s17, dept_i, branch_i";
>            runAndDump(client, sql, 1_200_000, 1, 2);
>        }
>    }
> }
> 
> 
> 
> 
> On 3/31/17, 7:59 AM, "Charles Givre" <cgivre@gmail.com> wrote:
> 
>    Hello there,
>    Is there any documentation for the new mock storage engine?  It looks
>    really useful.
>    Thanks,
>    - Charles
> 
> 

Mime
View raw message