manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Google Drive processing
Date Mon, 27 Oct 2014 18:27:35 GMT
Hi Ethan,

This does not sound like it is related in any way to the google drive
connection, unless for some reason the google API is considering some of
the documents fetched to have only metadata and no content.  In this case,
you'd see size of zero in the simple history for indexing activity record.
Is that what you see?

As for the filename issues -- file system output connection is supposed to
emulate WGET.  However, there are a number of known issues with this
connector, for example CONNECTORS-814, and I believe the handling of "&" is
one such issue.  I don't think these characters are allowed file names on
several operating systems.

Please open a ticket, and describe how you think it should behave (e.g. how
it should map &'s in urls to legal file name characters), and I'll try to
come up with a quick patch.

Karl


On Mon, Oct 27, 2014 at 12:15 PM, Ethan Wilansky <ethanwilansky@gmail.com>
wrote:

> I’ve run a job that uses a Google Drive Repository Connection and File
> System Output Connection. My output is pointing to d:\temp\mf on the
> machine running ManifoldCF.
>
> Upon running the job, job status shows:
> Error: Could not create file 'd:\temp\mf\https\
> doc-0g-1c-docs.googleusercontent.com\docs\securesc\288dijb8 lhptipmnpc6n3dap4bdki35j\ek70aeovi25lp7aibkar61h90pi1i2c3\1414418400000\14058876669334088852\07105634325979498590\0B4rsPDZwaBMUZjI3VGpzZi10dUU?h=00194472260389282923&e=download&gd=true'
> *(The filename, directory name, or volume label syntax is incorrect)*
>
> This same report that the file name, label or syntax is incorrect is being
> reported by the file system one more time. So, out of 12 files total, 10
> are processed. However, for the files that are reported as successfully
> processed, none of the files appear in the file system.
>
> I think the file system path is unusual beyond what I’ve specified for the
> job (d:\temp\mf). I’m seeing something like the following as the path
> structure:
> D:\temp\mf\https\doc-0g-1c-docs.googleusercontent.com
> \docs\securesc\288dijb8lhptipmnpc6n3dap4bdki35j\ek3m4mhv978b7a2elgov6cm9nipbv36e\1414418400000\13058876669334088852\07105634445979498592
>
> Document Status and Queue Status show nothing unusual. I’m running on
> ManifoldCF release (v1.7.1)
>
> Could this be an issue with the way I’m configuring the File System Ouput
> Connection or is there something else I need to configure? I properly
> configured the refresh token, client id and client secret in the Repository
> Connection.
>
> I’ve attached the JSON for the Repository Connection (with client id,
> client secret and refesh token values removed), my Output Connection and
> Job Definition.
>
> Thanks in advance for your feedback
> Ethan
>
>
>
>
> ,
>
>

Mime
View raw message