manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1162) Apache Kafka Output Connector
Date Thu, 13 Aug 2015 17:51:45 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695649#comment-14695649
] 

Karl Wright commented on CONNECTORS-1162:
-----------------------------------------

[~rafaharo]: First, we are talking here about a repository connector, not an output connector.
 Second, MCF's model treats documents atomically, so it does not get along well with pieces
and parts of documents coming in at different times.

This is all very late because development for GSoC ends on the 15th, so if you have specific
requirements we're probably not going to be able to meet them now.


> Apache Kafka Output Connector
> -----------------------------
>
>                 Key: CONNECTORS-1162
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1162
>             Project: ManifoldCF
>          Issue Type: Wish
>    Affects Versions: ManifoldCF 1.8.1, ManifoldCF 2.0.1
>            Reporter: Rafa Haro
>            Assignee: Karl Wright
>              Labels: gsoc, gsoc2015
>             Fix For: ManifoldCF 2.3
>
>         Attachments: 1.JPG, 2.JPG
>
>
> Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality
of a messaging system, but with a unique design. A single Kafka broker can handle hundreds
of megabytes of reads and writes per second from thousands of clients.
> Apache Kafka is being used for a number of uses cases. One of them is to use Kafka as
a feeding system for streaming BigData processes, both in Apache Spark or Hadoop environment.
A Kafka output connector could be used for streaming or dispatching crawled documents or metadata
and put them in a BigData processing pipeline



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message