manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From douglascrp@gmail.com <douglas...@gmail.com>
Subject Re: [jira] [Created] (CONNECTORS-1541) Documents updated in Google Drive are send with 0 byte to CMIS Output Connector
Date Sat, 06 Oct 2018 22:33:27 GMT
Hello.

I have found what the real problem is.

If you look at the implementation, when the CMIS Output Connector tries to create the document,
it reads from the contentStream at https://github.com/douglascrp/manifoldcf/blob/release-2.10-changed/connectors/cmis/connector/src/main/java/org/apache/manifoldcf/agents/output/cmisoutput/CmisOutputConnector.java#L964

When the document already exists, the CMIS library thows the CmisContentAlreadyExistsException,
and then, in the catch block, the code tries to reuse the contentStream in order to create
the new version, as you can see at https://github.com/douglascrp/manifoldcf/blob/release-2.10-changed/connectors/cmis/connector/src/main/java/org/apache/manifoldcf/agents/output/cmisoutput/CmisOutputConnector.java#L982
This is why the new version ended up as 0 byte, because the input stream has already been
consumed at this point.

As the XThreadInputStream does not allow to mark and reset the Stream, I could not find a
way to "reset" it in the catch block

The solution I found to avoid this was to check if the file already exists at the destination,
before trying to create it, but this is not good for performance, as for most of the times,
the document is a new one, and checking for it will make the process waaaaay slower.

The fix is available here https://github.com/douglascrp/manifoldcf/commit/03684f97688f21963b7a06e3c8dd71c120d50c91

I am not merging it yet because I want to wait for your opinion on this, as maybe there could
be a better way to deal with this input stream issue.

Please, let me know if you have any idea about this, as the original way to deal with the
process was way faster, and I would want to avoid my fix because of this.

Thank you in advance.

On 2018/10/04 17:09:00, "Douglas C. R. Paes (JIRA)" <jira@apache.org> wrote: 
> Douglas C. R. Paes created CONNECTORS-1541:
> ----------------------------------------------
> 
>              Summary: Documents updated in Google Drive are send with 0 byte to CMIS
Output Connector
>                  Key: CONNECTORS-1541
>                  URL: https://issues.apache.org/jira/browse/CONNECTORS-1541
>              Project: ManifoldCF
>           Issue Type: Bug
>           Components: Framework core
>     Affects Versions: ManifoldCF 2.10
>             Reporter: Douglas C. R. Paes
> 
> 
> When dealing with migration process, like when using the CMIS Output Connector to ingest
content into an ECM (Alfresco in my case), I noticed that when a document is updated inside
Google Drive, the engine is able to detect the change and put it into the queue to be updated
into the output.
> 
> By using the CMIS Output Connector, the document is versioned into Alfresco, but this
new version is always created as a 0 byte file.
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
> 

Mime
View raw message