sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jarek Jarcec Cecho (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-1811) IDF API changes
Date Mon, 01 Dec 2014 17:27:13 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230073#comment-14230073

Jarek Jarcec Cecho commented on SQOOP-1811:

The idea behind {{IntermediateDataFormat}} is to allow connector to allow arbitrary format
for the data. That is abstract concept and Sqoop doesn't impose any restrictions on how the
connector should represent the data internally. The idea is that the "internal" format will
be whatever is native for given connector.  JDBC based connectors will likely use CSV or Object
array, fast connectors will most likely end up with CSV. More advanced connectors might use
more advance structures. We have abstract template methods {{getData()}} and {{setData()}},
so that we can do some basic work with this abstract structure.

However in order to interpret the data we need the IDF to expose methods that will convert
the internal format to something that is agreed upon between the IDF and rest of Sqoop code
base, so that instead of one abstract object, we can get data that we can interpret in term
of columns and their values. We are requesting IDF to expose that via two different method
families. First is an Object representation via methods {{getObjectData()}} and {{setObjectData()}}
that is expected to return Java objects corresponding to the column values. Second way is
a CSV-ish representation that follows very strict formatting rules. I've tried to explain
why we are requesting both formats in my [earlier comment|https://issues.apache.org/jira/browse/SQOOP-1811?focusedCommentId=14226919&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14226919].

As a result the text format in IDF is intentionally required, we are requesting IDF to expose
us the text format and we are having the methods defined as abstract so that IDF will convert
the internal structure into text lazily (only if we need the text representation). The same
way the {{getObjectData()}} and {{setObjectData()}} works. I'm wondering what different goals
are we trying to achieve here?

> IDF API changes
> ---------------
>                 Key: SQOOP-1811
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1811
>             Project: Sqoop
>          Issue Type: Sub-task
>          Components: sqoop2-framework
>            Reporter: Veena Basavaraj
>             Fix For: 1.99.5
> 1. update the java docs for IDF apis.
> 2.  Make the getTextData final and call it getCSV and setCSV, so it is obvious that we
want to enforce CSV format
>  the following code can move to the base class IntermediateDataFormat and made final,
so there is no way to override this and we can enforce all to return String instead of generic
> {code}
> // hold the string in IDF base class
>  private final String text.
>   public final String getCSVTextData() {
>     return text;
>   }
>   public final void setCSVTextData(String text) {
>     this.text = text;
>   }
> {code}
> There is code in CSVIDF implementation that has the rules for CSV parsing that can be
pulled out into CSV Utils so that the connectors can use
> The T in CSV happens to String, which is just a coincidence, If I write a new IDF implementation
T can be a custom object that could encapsulate the whole row.
> Third, getData and setData can have custom implementation so they can be overriden to
return the generic type T

This message was sent by Atlassian JIRA

View raw message