manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1233) AmazonS3 Repository Connector
Date Fri, 28 Aug 2015 16:32:46 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720171#comment-14720171
] 

Karl Wright commented on CONNECTORS-1233:
-----------------------------------------

Ok, I actually did have a 1/2 hour to put towards this.  ant build now works, at least, although
I don't know if the dependencies are complete.  More about that later.

First problem: I note that the authority never returns GLOBAL_DENY_TOKEN, even when there
are connectivity problems with Amazon.  This is obviously incorrect and is a security problem.

Second, the Amazon sdk is about 13MB worth of jars.  That's going to bloat our binary and
lib images substantially.  Is there any way to determine which subset of the sdk jars and
their dependencies are actually being used?  To do this you would need to check out the branch,
modify the connectors/amazons3/build.xml file to only include the suspected subset of required
jars, do the ant build, and try the connector, and repeat until everything seems happy.



> AmazonS3 Repository Connector
> -----------------------------
>
>                 Key: CONNECTORS-1233
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1233
>             Project: ManifoldCF
>          Issue Type: New Feature
>            Reporter: Gunaratnam Kuhajeyan
>            Assignee: Karl Wright
>              Labels: features
>             Fix For: ManifoldCF 2.3
>
>         Attachments: amazons3patch.diff, amazons3patchnew1.diff, dependencies.docx, patch-tikaremoved.diff
>
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> Feature Patch 
> AmazonS3 Repository Connector
> AmazonS3 Repository Connector
> A. Overview
> 1. Connects to Amazons3 buckets, and indexes the artifact. if any buckets to be avoided
it can be skipped ( it can be configured in job)
> 2. Internally documents are parsed and meta data are extracted using Tika
> 3. Support Locale  - English US ( Currently common_en_US.properties, available, looking
for support from some to do the translation for the keys)
> B. Documentation - Work in progress, will be attached issue on the following days
> C. Dependencies - (common-lib)
> 1. aws-java-sdk-{version}.jar
> 2. aws-java-sdk-core-{version}.jar
> 3. aws-java-sdk-s3-{version}.jar
> 4. joda-time-2.2.jar
> D. Connectors.xml
>  <!-- Add your authority connectors here -->
> <authorityconnector name="Amazons3" class="org.apache.manifoldcf.authorities.authorities.amazons3.AmazonS3Authority"/>
 
> <!-- Add your repository connectors here -->
> <repositoryconnector name="AmazonS3" class="org.apache.manifoldcf.crawler.connectors.amazons3.AmazonS3Connector"/>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message