hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15277) Teach Hive how to create/delete Druid segments
Date Mon, 28 Nov 2016 19:42:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702905#comment-15702905
] 

ASF GitHub Bot commented on HIVE-15277:
---------------------------------------

GitHub user b-slim opened a pull request:

    https://github.com/apache/hive/pull/120

    HIVE-15277 Druid stograge handler

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/b-slim/hive rebase_druid_record_writer

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/hive/pull/120.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #120
    
----
commit 9025d4a33348faa007c17f2c7ff5dee4f3a87318
Author: Slim Bouguerra <slim.bouguerra@gmail.com>
Date:   2016-10-26T23:55:34Z

    adding druid record writer
    
    bump guava version to 16.0.1
    
    moving out the injector

commit be2e29dcba5617db478eefa75a5478a77512e090
Author: Jesus Camacho Rodriguez <jcamacho@apache.org>
Date:   2016-11-02T03:21:59Z

    Druid time granularity partitioning, serializer and necessary extensions

commit df4036f7f76294dc5599d29cdb760336b0ee9a4f
Author: Jesus Camacho Rodriguez <jcamacho@apache.org>
Date:   2016-11-02T19:59:52Z

    Recognition of dimensions and metrics
    
    patch 1

commit ea76f0ddfa33990d92e061676123c45920ed6dce
Author: Slim Bouguerra <slim.bouguerra@gmail.com>
Date:   2016-11-02T21:18:00Z

    adding file schema support

commit 010701be7cf939f6854c9ee113ccf40b20aed32a
Author: Jesus Camacho Rodriguez <jcamacho@apache.org>
Date:   2016-11-04T19:48:43Z

    native storage
    
    new fixes

commit 3d8496299d1d151da59bb6f547ebbc475c329197
Author: Slim Bouguerra <slim.bouguerra@gmail.com>
Date:   2016-11-09T17:57:03Z

    using segment output path

commit 2b10b26eb7a5d9a6058c9e1f206c599e54ec88b2
Author: Slim Bouguerra <slim.bouguerra@gmail.com>
Date:   2016-11-16T00:16:10Z

    adding check for existing datasource and implement drop table

commit e18b716a438e8b38155d4ab31b7070ae1945f1e4
Author: Slim Bouguerra <slim.bouguerra@gmail.com>
Date:   2016-11-19T00:53:10Z

    adding UTs and refactor some code

commit 3b31d16dcb9fd5cdb9eb6d1c994cb3f0c8cd8a33
Author: Slim Bouguerra <slim.bouguerra@gmail.com>
Date:   2016-11-23T23:49:28Z

    fix druid version

commit 4b447e56389aab1f45e9b48192068d1a0257a14c
Author: Slim Bouguerra <slim.bouguerra@gmail.com>
Date:   2016-11-28T19:32:02Z

    ignore record writer test

commit a7b4f792a5e28b0772addbc0d5ea52d5b44d9d91
Author: Slim Bouguerra <slim.bouguerra@gmail.com>
Date:   2016-11-28T19:38:25Z

    format code

----


> Teach Hive how to create/delete Druid segments 
> -----------------------------------------------
>
>                 Key: HIVE-15277
>                 URL: https://issues.apache.org/jira/browse/HIVE-15277
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Druid integration
>    Affects Versions: 2.2.0
>            Reporter: slim bouguerra
>            Assignee: slim bouguerra
>         Attachments: HIVE-15277.2.patch, HIVE-15277.patch, file.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In this implementation Hive will generate druid segment files and insert the metadata
to signal the handoff to druid.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "datasourcename")
> AS <select `timecolumn` as `___time`, `dimension1`,`dimension2`,  `metric1`, `metric2`....>;
> {code}
> This statement stores the results of query <input_query> in a Druid datasource
named 'datasourcename'. One of the columns of the query needs to be the time dimension, which
is mandatory in Druid. In particular, we use the same convention that it is used for Druid:
there needs to be a the column named '__time' in the result of the executed query, which will
act as the time dimension column in Druid. Currently, the time column dimension needs to be
a 'timestamp' type column.
> metrics can be of type long, double and float while dimensions are strings. Keep in mind
that druid has a clear separation between dimensions and metrics, therefore if you have a
column in hive that contains number and need to be presented as dimension use the cast operator
to cast as string. 
> This initial implementation interacts with Druid Meta data storage to add/remove the
table in druid, user need to supply the meta data config as --hiveconf hive.druid.metadata.password=XXX
--hiveconf hive.druid.metadata.username=druid --hiveconf hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message