manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <>
Subject [jira] [Commented] (CONNECTORS-916) Amazon CloudSearch output connector
Date Wed, 21 May 2014 07:46:38 GMT


Karl Wright commented on CONNECTORS-916:

Hi Takumi,

Yes, you have the basic idea.  My detailed writeup:

- We keep local repository on disk.  Each document in the repository consists of: (key, deleted/not_deleted,
amazon_data).  The key is Amazon key -- document URI (or its hash).  deleted/not_deleted is
a flag which is set if this record represents a deletion.  amazon_data is all the data needed
for transmission to amazon (json?)
- addOrReplaceDocument() adds document or replaces document in local repository only, and
clears deleted/not_deleted flags.
- deleteDocument() adds document or replaces document in local repository only, and sets deleted/not_deleted
- There is a method called transmit().  Transmit() sends a chunk of documents from local repository
to Amazon - say 1000 at a time from local repository.  If transmit() is successful, all documents
it sent are removed from local repository.  Otherwise they are left.
- transmit() return false if there are no more documents to be transmitted.  notifyOfCompletion()
calls transmit() until either there is an error exception, or until it returns false.

> Amazon CloudSearch output connector
> -----------------------------------
>                 Key: CONNECTORS-916
>                 URL:
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Amazon CloudSearch output connector
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Takumi Yoshida
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.7
>         Attachments: 0507.diff, 0520.diff, 0520_2.diff, 1.patch, 2.diff, 3.diff,,, exception_handling.diff, exception_handling_2.diff, licenselist.txt
> I wrote some codes snipetts of output connector for Amazon CloudSearch.
> I would like you to review my code. You can crawl web site and feed HTML page to Amazon
> but it is not perfectly completed followoing reason.
> - does not write any codes for configuration page.
> - supporting file type is only HTML
> Thank you for your time,
>  Takumi Yoshida

This message was sent by Atlassian JIRA

View raw message