manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CONNECTORS-1162) Apache Kafka Output Connector
Date Sat, 15 Aug 2015 09:50:45 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698205#comment-14698205
] 

Karl Wright edited comment on CONNECTORS-1162 at 8/15/15 9:50 AM:
------------------------------------------------------------------

Hi [~tugbadogan],

I've reviewed the code thoroughly.  While I can't read Rafa's mind, I do have a couple of
things we should work towards in the last week.

- You have a e.printStackTrace() in your exception handling.  That can't be in the final version.
- The exception handling in general looks weak.  Your code should not just reject documents
when there is an exception.  It should try to determine roughly what happened.  Specifically,
there are three possible responses:
(a) REJECT documents that Kafka cannot ever accept, due to characteristics of the document
itself
(b) throw ServiceInterruption exceptions when there is some temporary issue with connectivity,
and there is a chance that the operation will succeed if retried later
(c) throw ManifoldCFException when there is a persistent issue, e.g. configuration, that prevents
the connection from working properly
- Remove the repository connection entirely from the tree, since it is not going to be of
any use going forward
- Ideally, we should have an integration test for the output connector.  In this case this
would involve setting up a temporary local instance of Kafka, and running a test file system
crawl against it.  I don't know whether this is feasible but it is something that should be
considered.
- Documentation: I will need a set of usable screen shots for the documentation, one for each
connector tab.  These must be in .PNG format and should be full-screen.  I can crop them but
try to keep other windows out of them.  I will also need a short description of any Kafka
configuration specifics that are necessary, especially if there isn't an integration test
to look at.

Thanks, and hope you have a good remainder for your summer!



was (Author: kwright@metacarta.com):
Hi [~tugbadogan],

I've reviewed the code thoroughly.  While I can't read Rafa's mind, I do have a couple of
things we should work towards in the last week.

(1) You have a e.printStackTrace() in your exception handling.  That can't be in the final
version.
(2) The exception handling in general looks weak.  Your code should not just reject documents
when there is an exception.  It should try to determine roughly what happened.  Specifically,
there are three possible responses:
- REJECT documents that Kafka cannot ever accept, due to characteristics of the document itself
- throw ServiceInterruption exceptions when there is some temporary issue with connectivity,
and there is a chance that the operation will succeed if retried later
- throw ManifoldCFException when there is a persistent issue, e.g. configuration, that prevents
the connection from working properly
(3) Remove the repository connection entirely from the tree, since it is not going to be of
any use going forward
(4) Ideally, we should have an integration test for the output connector.  In this case this
would involve setting up a temporary local instance of Kafka, and running a test file system
crawl against it.  I don't know whether this is feasible but it is something that should be
considered.
(5) Documentation: I will need a set of usable screen shots for the documentation, one for
each connector tab.  These must be in .PNG format and should be full-screen.  I can crop them
but try to keep other windows out of them.  I will also need a short description of any Kafka
configuration specifics that are necessary, especially if there isn't an integration test
to look at.

Thanks, and hope you have a good remainder for your summer!


> Apache Kafka Output Connector
> -----------------------------
>
>                 Key: CONNECTORS-1162
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1162
>             Project: ManifoldCF
>          Issue Type: Wish
>    Affects Versions: ManifoldCF 1.8.1, ManifoldCF 2.0.1
>            Reporter: Rafa Haro
>            Assignee: Karl Wright
>              Labels: gsoc, gsoc2015
>             Fix For: ManifoldCF 2.3
>
>         Attachments: 1.JPG, 2.JPG
>
>
> Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality
of a messaging system, but with a unique design. A single Kafka broker can handle hundreds
of megabytes of reads and writes per second from thousands of clients.
> Apache Kafka is being used for a number of uses cases. One of them is to use Kafka as
a feeding system for streaming BigData processes, both in Apache Spark or Hadoop environment.
A Kafka output connector could be used for streaming or dispatching crawled documents or metadata
and put them in a BigData processing pipeline



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message