falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ajay Yadava (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-36) Ability to ingest data from databases
Date Tue, 04 Aug 2015 09:04:07 GMT

    [ https://issues.apache.org/jira/browse/FALCON-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653314#comment-14653314
] 

Ajay Yadava commented on FALCON-36:
-----------------------------------

[~me.venkatr] I am suggesting to have a type attribute instead of having them as top level
entities. It is completely in line with consumer requirements. All needs and use cases including
being able to list all databases can be trivially achieved by filtering on the type attribute.

I disagree with current thinking that top level entities of type database instead of datasource
is better from usabality stand point.  It's worse. I have already given a case of confusion
between streaming feeds vs. kafka entities. 

It's far easier to understand and use if we say that datasources are the sources for importing
and exporting data and database, kafka etc. are various types of datasources supported.  On
the contrary it's confusing to say we have one entity of type database which has x,y,z and
then we have Kafka. What is the purpose of each of them? Can we reuse and treat kafka entities
as feeds? Are kafka entities schedulable? All these questions will need to be answered for
each new type of entity. Another point is that the users need to remember all the types as
they need to specify it in various commands and it's easier to remember just one type "datasource"
rather than "database" and "kafka". There are several examples like that.

>From a maintainability of code stand point of view also it's lot helpful to classify them
as a single entity e.g. what is the order of load of entities? What about validity? It's lot
easier to classify it by just saying that entities of types data source load at this order
than specifying them for each type of datasource. 

Please reconsider.

> Ability to ingest data from databases
> -------------------------------------
>
>                 Key: FALCON-36
>                 URL: https://issues.apache.org/jira/browse/FALCON-36
>             Project: Falcon
>          Issue Type: Improvement
>          Components: acquisition
>    Affects Versions: 0.3
>            Reporter: Venkatesh Seetharam
>            Assignee: Venkat Ramachandran
>         Attachments: FALCON-36.patch, FALCON-36.patch.2, FALCON-36.rebase.patch, FALCON-36.review.patch,
Falcon Data Ingestion - Proposal.docx, falcon-36.xsd.patch.1
>
>
> Attempt to address data import from RDBMS into hadoop and export of data from Hadoop
into RDBMS. The plan is to use sqoop 1.x to materialize data motion from/to RDBMS to/from
HDFS. Hive will not be integrated in the first pass until Falcon has a first class integration
with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message