gearpump-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manu Zhang (JIRA)" <>
Subject [jira] [Created] (GEARPUMP-24) refactor DataSource API
Date Mon, 11 Apr 2016 00:57:25 GMT
Manu Zhang created GEARPUMP-24:

             Summary: refactor DataSource API
                 Key: GEARPUMP-24
             Project: Apache Gearpump
          Issue Type: Improvement
          Components: streaming
    Affects Versions: 0.8.0
            Reporter: Manu Zhang
            Assignee: Manu Zhang
             Fix For: 0.8.1

>From []:

The current DataSource API

trait DataSource extends {

   * open connection to data source
   * invoked in onStart() method of [[io.gearpump.streaming.source.DataSourceTask]]
   * @param context is the task context at runtime
   * @param startTime is the start time of system
  def open(context: TaskContext, startTime: Option[TimeStamp]): Unit

   * read a number of messages from data source.
   * invoked in each onNext() method of [[io.gearpump.streaming.source.DataSourceTask]]
   * @param batchSize max number of messages to read
   * @return a list of messages wrapped in [[io.gearpump.Message]]
  def read(batchSize: Int): List[Message]

   * close connection to data source.
   * invoked in onStop() method of [[io.gearpump.streaming.source.DataSourceTask]]
  def close(): Unit

has several issues

1. read returns a scala list of Message which is unfriendly to Java DataSources. Same for
Option parameter in open
2. the number of read messages may not be the same as the passed in batchSize which leaves
uncertainty to users (users may access out of boundary list positions)
3. to return a list an extra buffer could be needed in read (e.g. KafkaSource) which is not
best for performance

This message was sent by Atlassian JIRA

View raw message