manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From douglascrp@gmail.com <douglas...@gmail.com>
Subject Re: [jira] [Created] (CONNECTORS-1541) Documents updated in Google Drive are send with 0 byte to CMIS Output Connector
Date Sat, 06 Oct 2018 22:53:01 GMT
Never mind. I have solve the problem again, but this time, reusing the original approach.
I did some more research, and then I got to know the ReplayableInputStream class, which does
exactly what I need.
So, I am using it with the restart method in the second attempt, where the new version is
created, and it is working like a charm now.

Thank you anyway

On 2018/10/06 22:33:27, douglascrp@gmail.com <douglascrp@gmail.com> wrote: 
> Hello.
> 
> I have found what the real problem is.
> 
> If you look at the implementation, when the CMIS Output Connector tries to create the
document, it reads from the contentStream at https://github.com/douglascrp/manifoldcf/blob/release-2.10-changed/connectors/cmis/connector/src/main/java/org/apache/manifoldcf/agents/output/cmisoutput/CmisOutputConnector.java#L964
> 
> When the document already exists, the CMIS library thows the CmisContentAlreadyExistsException,
and then, in the catch block, the code tries to reuse the contentStream in order to create
the new version, as you can see at https://github.com/douglascrp/manifoldcf/blob/release-2.10-changed/connectors/cmis/connector/src/main/java/org/apache/manifoldcf/agents/output/cmisoutput/CmisOutputConnector.java#L982
> This is why the new version ended up as 0 byte, because the input stream has already
been consumed at this point.
> 
> As the XThreadInputStream does not allow to mark and reset the Stream, I could not find
a way to "reset" it in the catch block
> 
> The solution I found to avoid this was to check if the file already exists at the destination,
before trying to create it, but this is not good for performance, as for most of the times,
the document is a new one, and checking for it will make the process waaaaay slower.
> 
> The fix is available here https://github.com/douglascrp/manifoldcf/commit/03684f97688f21963b7a06e3c8dd71c120d50c91
> 
> I am not merging it yet because I want to wait for your opinion on this, as maybe there
could be a better way to deal with this input stream issue.
> 
> Please, let me know if you have any idea about this, as the original way to deal with
the process was way faster, and I would want to avoid my fix because of this.
> 
> Thank you in advance.
> 
> On 2018/10/04 17:09:00, "Douglas C. R. Paes (JIRA)" <jira@apache.org> wrote: 
> > Douglas C. R. Paes created CONNECTORS-1541:
> > ----------------------------------------------
> > 
> >              Summary: Documents updated in Google Drive are send with 0 byte to
CMIS Output Connector
> >                  Key: CONNECTORS-1541
> >                  URL: https://issues.apache.org/jira/browse/CONNECTORS-1541
> >              Project: ManifoldCF
> >           Issue Type: Bug
> >           Components: Framework core
> >     Affects Versions: ManifoldCF 2.10
> >             Reporter: Douglas C. R. Paes
> > 
> > 
> > When dealing with migration process, like when using the CMIS Output Connector to
ingest content into an ECM (Alfresco in my case), I noticed that when a document is updated
inside Google Drive, the engine is able to detect the change and put it into the queue to
be updated into the output.
> > 
> > By using the CMIS Output Connector, the document is versioned into Alfresco, but
this new version is always created as a 0 byte file.
> > 
> > 
> > 
> > --
> > This message was sent by Atlassian JIRA
> > (v7.6.3#76005)
> > 
> 

Mime
View raw message