hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-14474) Create datasource in Druid from Hive
Date Wed, 05 Oct 2016 13:51:21 GMT

    [ https://issues.apache.org/jira/browse/HIVE-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15548763#comment-15548763
] 

Hive QA commented on HIVE-14474:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12831727/HIVE-14474.04.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10656 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1403/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1403/console
Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1403/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12831727 - PreCommit-HIVE-Build

> Create datasource in Druid from Hive
> ------------------------------------
>
>                 Key: HIVE-14474
>                 URL: https://issues.apache.org/jira/browse/HIVE-14474
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Druid integration
>    Affects Versions: 2.2.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>         Attachments: HIVE-14474.01.patch, HIVE-14474.02.patch, HIVE-14474.03.patch, HIVE-14474.04.patch,
HIVE-14474.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In the initial implementation proposed in this issue, we will write the results of the
query to HDFS (or the location specified in the CTAS statement), and submit a HadoopIndexing
task to the Druid overlord. The task will contain the path where data was stored, it will
read it and create the segments in Druid. Once this is done, the results are removed from
Hive.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "my_query_based_datasource")
> AS <input_query>;
> {code}
> This statement stores the results of query <input_query> in a Druid datasource
named 'my_query_based_datasource'. One of the columns of the query needs to be the time dimension,
which is mandatory in Druid. In particular, we use the same convention that it is used for
Druid: there needs to be a the column named '\_\_time' in the result of the executed query,
which will act as the time dimension column in Druid. Currently, the time column dimension
needs to be a 'timestamp' type column.
> This initial implementation interacts with Druid API as it is currently exposed to the
user. In a follow-up issue, we should propose an implementation that integrates tighter with
Druid. In particular, we would like to store segments directly in Druid from Hive, thus avoiding
the overhead of writing Hive results to HDFS and then launching a MR job that basically reads
them again to create the segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message