manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-916) Amazon CloudSearch output connector
Date Wed, 21 May 2014 07:46:38 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004453#comment-14004453
] 

Karl Wright commented on CONNECTORS-916:
----------------------------------------

Hi Takumi,

Yes, you have the basic idea.  My detailed writeup:

- We keep local repository on disk.  Each document in the repository consists of: (key, deleted/not_deleted,
amazon_data).  The key is Amazon key -- document URI (or its hash).  deleted/not_deleted is
a flag which is set if this record represents a deletion.  amazon_data is all the data needed
for transmission to amazon (json?)
- addOrReplaceDocument() adds document or replaces document in local repository only, and
clears deleted/not_deleted flags.
- deleteDocument() adds document or replaces document in local repository only, and sets deleted/not_deleted
flag.
- There is a method called transmit().  Transmit() sends a chunk of documents from local repository
to Amazon - say 1000 at a time from local repository.  If transmit() is successful, all documents
it sent are removed from local repository.  Otherwise they are left.
- transmit() return false if there are no more documents to be transmitted.  notifyOfCompletion()
calls transmit() until either there is an error exception, or until it returns false.



> Amazon CloudSearch output connector
> -----------------------------------
>
>                 Key: CONNECTORS-916
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-916
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Amazon CloudSearch output connector
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Takumi Yoshida
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.7
>
>         Attachments: 0507.diff, 0520.diff, 0520_2.diff, 1.patch, 2.diff, 3.diff, AmazonCloudSearchParam.java,
AmazonCloudSearchSpecs.java, exception_handling.diff, exception_handling_2.diff, licenselist.txt
>
>
> I wrote some codes snipetts of output connector for Amazon CloudSearch.
> I would like you to review my code. You can crawl web site and feed HTML page to Amazon
CloudSearch.
> but it is not perfectly completed followoing reason.
> - does not write any codes for configuration page.
> - supporting file type is only HTML
> Thank you for your time,
>  Takumi Yoshida



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message