manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <>
Subject [jira] [Commented] (CONNECTORS-1233) AmazonS3 Repository Connector
Date Tue, 08 Sep 2015 17:20:46 GMT


Karl Wright commented on CONNECTORS-1233:

Ok, [~kbird], I was able to apply that patch.  I did some further rearrangement, and reformatting,
and committed to the branch.  This includes rearrangement of exception handling -- see below.

(1) After you synch up, the exception handling needs to be fleshed out.  Specifically, you
need to complete all the "handleXXX()" methods in order to distinguish between ServiceInterruption
exceptions and ManifoldCFException exceptions.  This is pretty important because otherwise
Amazon jobs will likely abort due to transient connectivity issues.

(2) There are other places in the code that also need some attention with regard to exceptions.
 For example:

    AmazonS3 amazons3Client = getClient();
    if (amazons3Client == null)
      throw new ManifoldCFException(
          "Amazon client can not connect at the moment");

I would think it was more appropriate for getClient() to *never* return null, and instead
throw the appropriate exception (ServiceInterruption or ManifoldCFException) itself when it
cannot establish a session?

> AmazonS3 Repository Connector
> -----------------------------
>                 Key: CONNECTORS-1233
>                 URL:
>             Project: ManifoldCF
>          Issue Type: New Feature
>            Reporter: Gunaratnam Kuhajeyan
>            Assignee: Karl Wright
>              Labels: features
>             Fix For: ManifoldCF 2.3
>         Attachments: amazons3patch-fixunboundedsize.diff, amazons3patch.diff, amazons3patchnew1.diff,
dependencies.docx, patch-removed-unwanted-dependencies-connector-1233.diff, patch-tikaremoved.diff,
>   Original Estimate: 240h
>  Remaining Estimate: 240h
> Feature Patch 
> AmazonS3 Repository Connector
> AmazonS3 Repository Connector
> A. Overview
> 1. Connects to Amazons3 buckets, and indexes the artifact. if any buckets to be avoided
it can be skipped ( it can be configured in job)
> 2. Internally documents are parsed and meta data are extracted using Tika
> 3. Support Locale  - English US ( Currently, available, looking
for support from some to do the translation for the keys)
> B. Documentation - Work in progress, will be attached issue on the following days
> C. Dependencies - (common-lib)
> 1. aws-java-sdk-{version}.jar
> 2. aws-java-sdk-core-{version}.jar
> 3. aws-java-sdk-s3-{version}.jar
> 4. joda-time-2.2.jar
> D. Connectors.xml
>  <!-- Add your authority connectors here -->
> <authorityconnector name="Amazons3" class="org.apache.manifoldcf.authorities.authorities.amazons3.AmazonS3Authority"/>
> <!-- Add your repository connectors here -->
> <repositoryconnector name="AmazonS3" class="org.apache.manifoldcf.crawler.connectors.amazons3.AmazonS3Connector"/>

This message was sent by Atlassian JIRA

View raw message