samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (SAMOA-40) Add Kafka stream reader modules to consume data from Kafka framework
Date Mon, 09 Nov 2015 12:23:10 GMT


ASF GitHub Bot commented on SAMOA-40:

Github user gdfm commented on the pull request:
    I think the main issue is a separation of concerns:
    One thing is the source of the data, another is the data format.
    That is, we could have Avro data coming from Kafka, or ARFF data coming from HDFS, and
we should be able to support all of them.
    Ideally, the source->format interface is unique and simple (e.g., a byte stream), and
it's a responsibility of the format to convert the byte stream into a sequence of instances.

> Add Kafka stream reader modules to consume data from Kafka framework
> --------------------------------------------------------------------
>                 Key: SAMOA-40
>                 URL:
>             Project: SAMOA
>          Issue Type: Task
>          Components: Infrastructure, SAMOA-API
>         Environment: OS X Version 10.10.3
>            Reporter: Vishal Karande
>            Priority: Minor
>              Labels: features
>   Original Estimate: 168h
>  Remaining Estimate: 168h
> Apache SAMOA is designed to process streaming data and develop streaming machine learning
> algorithm. Currently, SAMOA framework supports stream data read from Arff files only.
> Thus, while using SAMOA as a streaming machine learning component in real time use-cases,
> writing and reading data from files is slow and inefficient.
> A single Kafka broker can handle hundreds of megabytes of reads and writes per second

> from thousands of clients. The ability to read data directly from Apache Kafka into SAMOA
> not only improve performance but also make SAMOA pluggable to many real time machine
> learning use cases such as Internet of Things(IoT).
> Add code that enables SAMOA to read data from Apache Kafka as a stream data.
> Kafka stream reader supports following different options for streaming:
> a) Topic selection - Kafka topic to read data
> b) Partition selection - Kafka partition to read data
> c) Batching - Number of data instances read from Kafka in one read request to Kafka
> d) Configuration options - Kafka port number, seed information, time delay between two
read requests
> Components:
> KafkaReader - Consists for APIs to read data from Kafka
> KafkaStream - Stream source for SAMOA providing data read from Kafka
> Dependencies for Kafka are added in pom.xml for in samoa-api component. 

This message was sent by Atlassian JIRA

View raw message