flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-6039) Row of TableFunction should support flexible number of fields
Date Tue, 14 Mar 2017 09:23:41 GMT

    [ https://issues.apache.org/jira/browse/FLINK-6039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923859#comment-15923859
] 

ASF GitHub Bot commented on FLINK-6039:
---------------------------------------

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3529#discussion_r105856276
  
    --- Diff: flink-core/src/main/java/org/apache/flink/types/Row.java ---
    @@ -66,10 +66,11 @@ public int getArity() {
     	 * Gets the field at the specified position.
     	 * @param pos The position of the field, 0-based.
     	 * @return The field at the specified position.
    -	 * @throws IndexOutOfBoundsException Thrown, if the position is negative, or equal to,
or larger than the number of fields.
    +	 * Return null if the position is equal to, or larger than the number of fields.
    +	 * @throws IndexOutOfBoundsException Thrown, if the position is negative.
     	 */
     	public Object getField(int pos) {
    -		return fields[pos];
    +		return pos >= fields.length ? null : fields[pos];
    --- End diff --
    
    This will cause overhead for basically every operation and should not be done to support
a minor feature.
    If you expect that you might receive a `Row` which violates the expected schema and you
want to avoid an `IndexOutOfBoundsException` you should rather check `getArity()`.


> Row of TableFunction should support flexible number of fields
> -------------------------------------------------------------
>
>                 Key: FLINK-6039
>                 URL: https://issues.apache.org/jira/browse/FLINK-6039
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Zhuoluo Yang
>            Assignee: Zhuoluo Yang
>
> In actual world, especially while processing logs with TableFunction. The formats of
the logs in actual world are flexible. Thus, the number of fields should not be fixed. 
> For examples, we should make the three following types of of TableFunction work.
> {code}
> // Test for incomplete row
> class TableFunc4 extends TableFunction[Row] {
>   def eval(str: String): Unit = {
>     if (str.contains("#")) {
>       str.split("#").foreach({ s =>
>         val row = new Row(3)
>         row.setField(0, s)  // And we only set values for one column
>         collect(row)
>       })
>     }
>   }
>   override def getResultType: TypeInformation[Row] = {
>     new RowTypeInfo(BasicTypeInfo.STRING_TYPE_INFO,
>                     BasicTypeInfo.INT_TYPE_INFO,
>                     BasicTypeInfo.INT_TYPE_INFO)
>   }
> }
> // Test for incomplete row
> class TableFunc5 extends TableFunction[Row] {
>   def eval(str: String): Unit = {
>     if (str.contains("#")) {
>       str.split("#").foreach({ s =>
>         val row = new Row(1)  // ResultType is three columns, we have only one here
>         row.setField(0, s)
>         collect(row)
>       })
>     }
>   }
>   override def getResultType: TypeInformation[Row] = {
>     new RowTypeInfo(BasicTypeInfo.STRING_TYPE_INFO,
>       BasicTypeInfo.INT_TYPE_INFO,
>       BasicTypeInfo.INT_TYPE_INFO)
>   }
> }
> // Test for overflow row
> class TableFunc6 extends TableFunction[Row] {
>   def eval(str: String): Unit = {
>     if (str.contains("#")) {
>       str.split("#").foreach({ s =>
>         val row = new Row(5)  // ResultType is two columns, we have five columns here
>         row.setField(0, s)
>         row.setField(1, s.length)
>         row.setField(2, s.length)
>         row.setField(3, s.length)
>         row.setField(4, s.length)
>         collect(row)
>       })
>     }
>   }
>   override def getResultType: TypeInformation[Row] = {
>     new RowTypeInfo(BasicTypeInfo.STRING_TYPE_INFO,
>                     BasicTypeInfo.INT_TYPE_INFO)
>   }
> }
> {code}
> Actually, the TableFunc4 and TableFunc6 has already worked correctly with current version.
This issue will make TableFunc5 works.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message