manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1006) Google native documents are not crawled
Date Mon, 11 Aug 2014 11:43:12 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092694#comment-14092694
] 

Karl Wright commented on CONNECTORS-1006:
-----------------------------------------

Hi Shigeki,

I updated the version of the google drive API libraries we use in trunk.  Hopefully that will
fix the problem, but you will need to try it to be sure.  To do that:

- check out trunk
- ant make-core-deps
- ant build

Thanks!

> Google native documents are not crawled
> ---------------------------------------
>
>                 Key: CONNECTORS-1006
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1006
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: GoogleDrive connector
>    Affects Versions: ManifoldCF 1.4.1
>            Reporter: Shigeki Kobayashi
>
> I use MCF 1.4.1 and try to crawl google native documents such as spreadsheet then index
to solr.
> It seems that MCF would not extract the contents. Maybe MCF would not export  spreadsheet
to PDF.
> The Simple History tells the result of crawl is "NO LENGTH".
>  
> The documents are saved as Google Spreadsheet in Google Docs, which are also managed
in Google Drive.
> As MCF documentation says "native Google documents such as spreadsheets and word documents
are exported to PDF and then ingested", those Google Spreadsheets should be crawled and indexed.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message