Return-Path: X-Original-To: apmail-manifoldcf-user-archive@www.apache.org Delivered-To: apmail-manifoldcf-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D48581114B for ; Mon, 11 Aug 2014 10:07:52 +0000 (UTC) Received: (qmail 60871 invoked by uid 500); 11 Aug 2014 10:07:52 -0000 Delivered-To: apmail-manifoldcf-user-archive@manifoldcf.apache.org Received: (qmail 60812 invoked by uid 500); 11 Aug 2014 10:07:52 -0000 Mailing-List: contact user-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@manifoldcf.apache.org Delivered-To: mailing list user@manifoldcf.apache.org Received: (qmail 60802 invoked by uid 99); 11 Aug 2014 10:07:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Aug 2014 10:07:52 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of shigeki.kobayashi3@g.softbank.co.jp designates 74.125.245.76 as permitted sender) Received: from [74.125.245.76] (HELO na3sys010aog104.obsmtp.com) (74.125.245.76) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 11 Aug 2014 10:07:46 +0000 Received: from mail-yh0-f47.google.com ([209.85.213.47]) (using TLSv1) by na3sys010aob104.postini.com ([74.125.244.12]) with SMTP ID DSNKU+iV2/gkT/yR47bMGER8gMAVKS/URvW7@postini.com; Mon, 11 Aug 2014 03:07:25 PDT Received: by mail-yh0-f47.google.com with SMTP id f10so6032295yha.6 for ; Mon, 11 Aug 2014 03:07:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=xcBiZQ9jvef0IvsbRilOgdjWm5ec05n2zdCYKJDOILQ=; b=kocMrtDtJmYvGaRV79O2m+pjXmU62EhxoTcl+qnKurYyftvb93BXT+SQNROcyDKtu3 +4dJUjTKs8zmDTgIUBOZlJJDBXjSoY01HRfD9I1+bUmxf4kHd5D7TlXXC3v+b25YhdmG NYKXwxVJ7LR5c4928pBK7ucxm/a4YrlidVVNjVM4VXuGFFgIzqYT4n9VxSExoU9bNSOm UI0ZgNCpXuGSSqmz4Ptw/z4doYQ0VpIETBKKy7CmJhpBLeob8yIp7CFF+ZUlpSxamroo 49DJd/RpAEJDxCsp5Y7LpEeE/5iW8H8hHu2+NVYbihDuGXqoE/hm15C8oMiaVDgA521A Nnyw== X-Gm-Message-State: ALoCoQmJep6ZYz2V/Gp7zmh1wbVbI6lIc4ORqWE647v8R0S77QtV/WgMoSiug5fU/QPax9RUpj0nlQChls7fsnA0Jb2VEgeen4bfmSk9zM6PDp1KkFuyvGCYIw0tDMoRLERIOyUWKBp+V1kVLDLt/bckS+efnUoH5A== X-Received: by 10.236.189.167 with SMTP id c27mr1389030yhn.164.1407751642282; Mon, 11 Aug 2014 03:07:22 -0700 (PDT) X-Received: by 10.236.189.167 with SMTP id c27mr1389016yhn.164.1407751642162; Mon, 11 Aug 2014 03:07:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.170.159.136 with HTTP; Mon, 11 Aug 2014 03:06:42 -0700 (PDT) In-Reply-To: References: From: Shigeki Kobayashi Date: Mon, 11 Aug 2014 19:06:42 +0900 Message-ID: Subject: Re: Google native documents are not crawled To: "user@manifoldcf.apache.org" Content-Type: multipart/alternative; boundary=089e0160af58905e67050057b886 X-Virus-Checked: Checked by ClamAV on apache.org --089e0160af58905e67050057b886 Content-Type: text/plain; charset=UTF-8 Hi Karl, The documents are saved as Google Spreadsheet in Google Docs, which is also managed in Google Drive. As MCF documentation says "native Google documents such as spreadsheets and word documents are exported to PDF and then ingested", those Google Spreadsheets should be crawled and indexed. Shigeki 2014-08-07 21:05 GMT+09:00 Karl Wright : > Hi Shigeki, > > The javadoc says the following about this method: > > "The size of the file in bytes. This is only populated for files with > content stored in Drive." > > Are these documents stored in Drive, or somewhere else? > > Karl > > > On Thu, Aug 7, 2014 at 8:02 AM, Karl Wright wrote: > >> Hi Shigeki, >> >> The connector tries to get the length of the file, using the googledocs >> API: >> >> // Get the file length >> Long fileLength = googleFile.getFileSize(); >> if (fileLength != null) { >> >> ... where googleFile is a com.google.api.services.drive.model.File object. >> >> But, the file length is coming back as null, which the connector assumes >> means that the file is unreadable somehow. >> >> Can you open a ticket, so that we can look into this in more detail? >> >> Karl >> > > --089e0160af58905e67050057b886 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Karl,


The documents a= re saved as Google Spreadsheet in Google Docs, which is also managed in Goo= gle Drive.

As MCF documentation says "native Google documents such as spr= eadsheets and word documents are exported to PDF and then ingested", t= hose Google Spreadsheets should be crawled and indexed.=C2=A0


<= div class=3D"gmail_extra">Shigeki

2014-08= -07 21:05 GMT+09:00 Karl Wright <daddywri@gmail.com>:
Hi Shigeki,

The ja= vadoc says the following about this method:

"The size of the file in bytes. This is only populated for files w= ith content stored in Drive."

Are these documents stored in Drive, or somewhere else?

Karl


On Thu, Aug= 7, 2014 at 8:02 AM, Karl Wright <daddywri@gmail.com> wrote= :
Hi Shigeki,
<= br>
The connector tries to get the length of the file, using the googledo= cs API:

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 // G= et the file length
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 Long fileLength =3D googleFile.getFileSize();
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (file= Length !=3D null) {

... where googleFile is a com.google.= api.services.drive.model.File object.

But, the file length is = coming back as null, which the connector assumes means that the file is unr= eadable somehow.

Can you open a ticket, so that we can look into this in more detail?
Karl



--089e0160af58905e67050057b886--