manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Arroyo <arroyoescobarda...@gmail.com>
Subject Re: CONNECTORS-1290 [GSOC 2016] Nuxeo repository and Authority connector for Apache ManifoldCF
Date Tue, 09 Aug 2016 08:46:50 GMT
Hi Karl,


Thanks for you answer.


Finally I have decided to take the first approach using the document
component because it is the best option for the way in which the connector
is being developed.


Regards.

On 8 August 2016 at 15:06, Karl Wright <daddywri@gmail.com> wrote:

> Hi David,
>
> There are two possible approaches:
>
> (1) Use the "document component" identifier when you index, or
> (2) Have a means of representing document attachments in the connector's
> document identifier.
>
> The pertinent parts of IProcessActivity for components are as follows:
>
> >>>>>>
>   /** Check if a document needs to be reindexed, based on a computed
> version string.
>   * Call this method to determine whether reindexing is necessary.  Pass in
> a newly-computed version
>   * string.  This method will return "true" if the document needs to be
> re-indexed.
>   *@param documentIdentifier is the document identifier.
>   *@param componentIdentifier is the component document identifier, if any.
>   *@param newVersionString is the newly-computed version string.
>   *@return true if the document needs to be reindexed.
>   */
>   public boolean checkDocumentNeedsReindexing(String documentIdentifier,
>     String componentIdentifier,
>     String newVersionString)
>     throws ManifoldCFException;
>
>   /** Ingest the current document.
>   *@param documentIdentifier is the document's identifier.
>   *@param componentIdentifier is the component document identifier, if any.
>   *@param version is the version of the document, as reported by the
> getDocumentVersions() method of the
>   *       corresponding repository connector.
>   *@param documentURI is the URI to use to retrieve this document from the
> search interface (and is
>   *       also the unique key in the index).
>   *@param data is the document data.  The data is closed after ingestion is
> complete.
>   *@throws IOException only when data stream reading fails.
>   */
>   public void ingestDocumentWithException(String documentIdentifier,
>     String componentIdentifier,
>     String version, String documentURI, RepositoryDocument data)
>     throws ManifoldCFException, ServiceInterruption, IOException;
> <<<<<<
>
> The assumption is that the document with all of its components are
> considered and optionally (re)indexed at the same time.  This model was
> developed for a connector where the primary document was an XML document
> that contained all the actual content.
>
> The alternate model, which involves making the attachments each completely
> independent documents, is the preferred one, if it is possible to implement
> it that way.  The SharePoint connector takes this approach, and uses MCF
> carry-down information to preserve what it needs from the attachment's
> parent.
>
> Karl
>
>
> On Mon, Aug 8, 2016 at 8:29 AM, David Arroyo <arroyoescobardavid@gmail.com
> >
> wrote:
>
> > Hi.
> >
> >
> > I am currently dealing with document’s attachments for the Nuxeo
> connector.
> > When consuming documents’ information using the Nuxeo REST API, I can
> > access to documents’ attachments, but I’m not sure how to manage them
> > within the connector. If I index them as a separated documents, I would
> > need to find a way to relate them with the main or parent document,
> because
> > if I need to delete or reindex it, I would need to find the attachments
> for
> > deleting them also and/or reindex them if needed.
> >
> >
> > Any idea about how to manage the attachments in ManifoldCF?
> >
> >
> > Thanks you very much.
> >
> > Regards.
> > --
> > David Arroyo Escobar
> >
>



-- 
David Arroyo Escobar

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message