manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <>
Subject Re: CONNECTORS-1290 [GSOC 2016] Nuxeo repository and Authority connector for Apache ManifoldCF
Date Mon, 08 Aug 2016 13:06:34 GMT
Hi David,

There are two possible approaches:

(1) Use the "document component" identifier when you index, or
(2) Have a means of representing document attachments in the connector's
document identifier.

The pertinent parts of IProcessActivity for components are as follows:

  /** Check if a document needs to be reindexed, based on a computed
version string.
  * Call this method to determine whether reindexing is necessary.  Pass in
a newly-computed version
  * string.  This method will return "true" if the document needs to be
  *@param documentIdentifier is the document identifier.
  *@param componentIdentifier is the component document identifier, if any.
  *@param newVersionString is the newly-computed version string.
  *@return true if the document needs to be reindexed.
  public boolean checkDocumentNeedsReindexing(String documentIdentifier,
    String componentIdentifier,
    String newVersionString)
    throws ManifoldCFException;

  /** Ingest the current document.
  *@param documentIdentifier is the document's identifier.
  *@param componentIdentifier is the component document identifier, if any.
  *@param version is the version of the document, as reported by the
getDocumentVersions() method of the
  *       corresponding repository connector.
  *@param documentURI is the URI to use to retrieve this document from the
search interface (and is
  *       also the unique key in the index).
  *@param data is the document data.  The data is closed after ingestion is
  *@throws IOException only when data stream reading fails.
  public void ingestDocumentWithException(String documentIdentifier,
    String componentIdentifier,
    String version, String documentURI, RepositoryDocument data)
    throws ManifoldCFException, ServiceInterruption, IOException;

The assumption is that the document with all of its components are
considered and optionally (re)indexed at the same time.  This model was
developed for a connector where the primary document was an XML document
that contained all the actual content.

The alternate model, which involves making the attachments each completely
independent documents, is the preferred one, if it is possible to implement
it that way.  The SharePoint connector takes this approach, and uses MCF
carry-down information to preserve what it needs from the attachment's


On Mon, Aug 8, 2016 at 8:29 AM, David Arroyo <>

> Hi.
> I am currently dealing with document’s attachments for the Nuxeo connector.
> When consuming documents’ information using the Nuxeo REST API, I can
> access to documents’ attachments, but I’m not sure how to manage them
> within the connector. If I index them as a separated documents, I would
> need to find a way to relate them with the main or parent document, because
> if I need to delete or reindex it, I would need to find the attachments for
> deleting them also and/or reindex them if needed.
> Any idea about how to manage the attachments in ManifoldCF?
> Thanks you very much.
> Regards.
> --
> David Arroyo Escobar

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message