falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venkat Ramachandran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FALCON-36) Ability to ingest data from databases
Date Thu, 23 Jul 2015 21:37:05 GMT

    [ https://issues.apache.org/jira/browse/FALCON-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639551#comment-14639551
] 

Venkat Ramachandran commented on FALCON-36:
-------------------------------------------

[~ajayyadava] thanks for the comments. 

1. re-using the types now
2. fields is optional. 
    * if fields is missing -> all columns. 
    * if includes is specified, then subset of columns will be projected
    * if excludes is specified, then all columns except the specified will be projected
3. Feed import will be tied to a cluster. 
    * User can make this cluster as source and another as target to leverage replication to
copy.   
    * Alternatively (not reason why), a user can add import policy to all the cluster in a
feed. 
    * This will cause one import job running per feed cluster from the specified database

    * This will cause over load of database and data inconsistency. It's better to import
on the source cluster and replicate.
4. We are going with database as the entity since datasource is so generic that we can't enforce
validations and support specific capabilities. We will add kafkabroker as another entity.
5. Just following the convention of cluster. Description is an attribute. Also have a tags
as an element. 
6. Fixed the documentation in tags column 
7. Will add the drive to database.xml example in the next patch
8. The driver specified will be passed on to underlying implementation (in this case Sqoop)
to load the jars. 
9. fixed the database.xml example.
10. Type identifies what type of database - mysql, oracle etc in order to take advantage of
specific features and driver support. 
11. Version is carried over from cluster interface definition, but in this entity it is not
needed since the driver will supply the correct verion of the jar to be used along with the
class name. removing it 

I'm redoing the patch to use this new XSD and need to add some test case. I will upload the
patch in the next 2 days with or without test cases for initial review. 

> Ability to ingest data from databases
> -------------------------------------
>
>                 Key: FALCON-36
>                 URL: https://issues.apache.org/jira/browse/FALCON-36
>             Project: Falcon
>          Issue Type: Improvement
>          Components: acquisition
>    Affects Versions: 0.3
>            Reporter: Venkatesh Seetharam
>            Assignee: Venkat Ramachandran
>         Attachments: FALCON-36.patch, FALCON-36.rebase.patch, FALCON-36.review.patch,
Falcon Data Ingestion - Proposal.docx, falcon-36.xsd.patch.1
>
>
> Attempt to address data import from RDBMS into hadoop and export of data from Hadoop
into RDBMS. The plan is to use sqoop 1.x to materialize data motion from/to RDBMS to/from
HDFS. Hive will not be integrated in the first pass until Falcon has a first class integration
with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message