gearpump-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GEARPUMP-24) refactor DataSource API
Date Wed, 27 Apr 2016 14:35:12 GMT

    [ https://issues.apache.org/jira/browse/GEARPUMP-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260248#comment-15260248
] 

ASF GitHub Bot commented on GEARPUMP-24:
----------------------------------------

Github user manuzhang commented on a diff in the pull request:

    https://github.com/apache/incubator-gearpump/pull/7#discussion_r61267757
  
    --- Diff: external/kafka/src/main/scala/io/gearpump/streaming/kafka/KafkaSource.scala
---
    @@ -168,21 +169,11 @@ class KafkaSource(
           tp -> new KafkaOffsetManager(storage)
         }.toMap
     
    -    setStartTime(startTime)
    +    setStartTime(Option(startTime))
       }
     
    -  override def read(batchSize: Int): List[Message] = {
    -    val messageBuffer = ArrayBuffer.empty[Message]
    -
    -    fetchThread.foreach {
    -      fetch =>
    -        var count = 0
    -        while (count < batchSize) {
    -          fetch.poll.flatMap(filterMessage).foreach(messageBuffer += _)
    -          count += 1
    -        }
    -    }
    -    messageBuffer.toList
    +  override def read(): Message = {
    +    fetchThread.flatMap(_.poll.flatMap(filterMessage)).orNull
    --- End diff --
    
    in case it's called uninitialized


> refactor DataSource API
> -----------------------
>
>                 Key: GEARPUMP-24
>                 URL: https://issues.apache.org/jira/browse/GEARPUMP-24
>             Project: Apache Gearpump
>          Issue Type: Improvement
>          Components: streaming
>    Affects Versions: 0.8.0
>            Reporter: Manu Zhang
>            Assignee: Manu Zhang
>             Fix For: 0.8.1
>
>
> From [https://github.com/gearpump/gearpump/issues/2013]:
> The current DataSource API
> {code}
> trait DataSource extends java.io.Serializable {
>   /**
>    * open connection to data source
>    * invoked in onStart() method of [[io.gearpump.streaming.source.DataSourceTask]]
>    * @param context is the task context at runtime
>    * @param startTime is the start time of system
>    */
>   def open(context: TaskContext, startTime: Option[TimeStamp]): Unit
>   /**
>    * read a number of messages from data source.
>    * invoked in each onNext() method of [[io.gearpump.streaming.source.DataSourceTask]]
>    * @param batchSize max number of messages to read
>    * @return a list of messages wrapped in [[io.gearpump.Message]]
>    */
>   def read(batchSize: Int): List[Message]
>   /**
>    * close connection to data source.
>    * invoked in onStop() method of [[io.gearpump.streaming.source.DataSourceTask]]
>    */
>   def close(): Unit
> }
> {code}
> has several issues
> 1. read returns a scala list of Message which is unfriendly to Java DataSources. Same
for Option parameter in open
> 2. the number of read messages may not be the same as the passed in batchSize which leaves
uncertainty to users (users may access out of boundary list positions)
> 3. to return a list an extra buffer could be needed in read (e.g. KafkaSource) which
is not best for performance
> Update:
> I'd like to add a "getWatermark" interface that helps to fix GEARPUMP-32



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message