manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tugba Dogan (JIRA)" <>
Subject [jira] [Commented] (CONNECTORS-1162) Apache Kafka Output Connector
Date Sun, 22 Mar 2015 22:16:10 GMT


Tugba Dogan commented on CONNECTORS-1162:

I am Tugba Dogan. I am currently undergraduate student in Bilkent University.  I am really
interested working in this project for GSoC 2015. I’ll graduate in 1st of June 2015 and
I will not have other commitment during the summer other than GSoC project. So, I think I
can work 7-8 hours per day in weekdays. This will be my first GSoC experience. 
I want to work on Big Data industry after graduation and I think this project will help me
to be involved in that area.  I would like to discuss details about this project and get feedback
for my proposal from you.

I have installed a ManifoldCF instance to my server and started to using it. I can also install
single and distributed Kafka cluster and I can test its integration during the development.
I have some knowledge about Kafka too.
I think we might also implement repository connector for Kafka because I think that it might
be very useful transferring data to other output connectors Solr, Elasticsearch, HDFS etc
from Kafka repository.

Because of the fact that Kafka does not provide any ACL features for now, we won't need authority
connector for Kafka at this time. They are planning to implement these features in future
releases, we might add that feature to ManifoldCF later.

Here is my planned deliverables for this project:
Output Connectors for Kafka 0.8.x and 0.1-0.7.x
Unit & Integration tests for output connector
Repository Connectors for Kafka 0.8.x and 0.1-0.7.x
Unit & Integration tests for repository connector

I guess Kafka 0.8.x is not backward compatible with old versions. Do you think that we should
implement connectors for old versions ?


Proposal Draft:

> Apache Kafka Output Connector
> -----------------------------
>                 Key: CONNECTORS-1162
>                 URL:
>             Project: ManifoldCF
>          Issue Type: Wish
>    Affects Versions: ManifoldCF 1.8.1, ManifoldCF 2.0.1
>            Reporter: Rafa Haro
>            Assignee: Rafa Haro
>              Labels: gsoc, gsoc2015
>             Fix For: ManifoldCF 1.9, ManifoldCF 2.1
> Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality
of a messaging system, but with a unique design. A single Kafka broker can handle hundreds
of megabytes of reads and writes per second from thousands of clients.
> Apache Kafka is being used for a number of uses cases. One of them is to use Kafka as
a feeding system for streaming BigData processes, both in Apache Spark or Hadoop environment.
A Kafka output connector could be used for streaming or dispatching crawled documents or metadata
and put them in a BigData processing pipeline

This message was sent by Atlassian JIRA

View raw message