Return-Path: X-Original-To: apmail-manifoldcf-commits-archive@www.apache.org Delivered-To: apmail-manifoldcf-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 95C33913D for ; Sun, 2 Jun 2013 12:24:00 +0000 (UTC) Received: (qmail 62325 invoked by uid 500); 2 Jun 2013 11:23:59 -0000 Delivered-To: apmail-manifoldcf-commits-archive@manifoldcf.apache.org Received: (qmail 62275 invoked by uid 500); 2 Jun 2013 11:23:59 -0000 Mailing-List: contact commits-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@manifoldcf.apache.org Delivered-To: mailing list commits@manifoldcf.apache.org Received: (qmail 62268 invoked by uid 99); 2 Jun 2013 11:23:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Jun 2013 11:23:58 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Jun 2013 11:23:51 +0000 Received: from eris.apache.org (localhost [127.0.0.1]) by eris.apache.org (Postfix) with ESMTP id 46E1C23889EA; Sun, 2 Jun 2013 11:23:30 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r1488664 - in /manifoldcf/trunk: ./ connectors/ connectors/dropbox/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/dropbox/ connectors/googledrive/ connectors/wiki/ connectors/wiki/connector/src/main/java/org/apache/manifol... Date: Sun, 02 Jun 2013 11:23:29 -0000 To: commits@manifoldcf.apache.org From: kwright@apache.org X-Mailer: svnmailer-1.0.8-patched Message-Id: <20130602112330.46E1C23889EA@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: kwright Date: Sun Jun 2 11:23:28 2013 New Revision: 1488664 URL: http://svn.apache.org/r1488664 Log: Add Google Drive connector - CONNECTORS-694. Thanks to Andrew Janowczyk for this contribution. Added: manifoldcf/trunk/connectors/googledrive/ - copied from r1488663, manifoldcf/branches/CONNECTORS-694/connectors/googledrive/ manifoldcf/trunk/framework/core/src/main/java/org/apache/manifoldcf/core/common/XThreadStringBuffer.java - copied unchanged from r1488663, manifoldcf/branches/CONNECTORS-694/framework/core/src/main/java/org/apache/manifoldcf/core/common/XThreadStringBuffer.java manifoldcf/trunk/site/src/documentation/resources/images/en_US/googledrive-repository-connection-configuration-save.PNG - copied unchanged from r1488663, manifoldcf/branches/CONNECTORS-694/site/src/documentation/resources/images/en_US/googledrive-repository-connection-configuration-save.PNG manifoldcf/trunk/site/src/documentation/resources/images/en_US/googledrive-repository-connection-configuration.PNG - copied unchanged from r1488663, manifoldcf/branches/CONNECTORS-694/site/src/documentation/resources/images/en_US/googledrive-repository-connection-configuration.PNG manifoldcf/trunk/site/src/documentation/resources/images/en_US/googledrive-repository-connection-job-googledrive-seed-query.PNG - copied unchanged from r1488663, manifoldcf/branches/CONNECTORS-694/site/src/documentation/resources/images/en_US/googledrive-repository-connection-job-googledrive-seed-query.PNG manifoldcf/trunk/site/src/documentation/resources/images/en_US/googledrive-repository-setup-1.PNG - copied unchanged from r1488663, manifoldcf/branches/CONNECTORS-694/site/src/documentation/resources/images/en_US/googledrive-repository-setup-1.PNG manifoldcf/trunk/site/src/documentation/resources/images/en_US/googledrive-repository-setup-2.PNG - copied unchanged from r1488663, manifoldcf/branches/CONNECTORS-694/site/src/documentation/resources/images/en_US/googledrive-repository-setup-2.PNG manifoldcf/trunk/site/src/documentation/resources/images/en_US/googledrive-repository-setup-3.PNG - copied unchanged from r1488663, manifoldcf/branches/CONNECTORS-694/site/src/documentation/resources/images/en_US/googledrive-repository-setup-3.PNG manifoldcf/trunk/site/src/documentation/resources/images/en_US/googledrive-repository-setup-4.PNG - copied unchanged from r1488663, manifoldcf/branches/CONNECTORS-694/site/src/documentation/resources/images/en_US/googledrive-repository-setup-4.PNG manifoldcf/trunk/site/src/documentation/resources/images/en_US/googledrive-repository-setup-5.PNG - copied unchanged from r1488663, manifoldcf/branches/CONNECTORS-694/site/src/documentation/resources/images/en_US/googledrive-repository-setup-5.PNG Removed: manifoldcf/trunk/connectors/wiki/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/wiki/PageBuffer.java Modified: manifoldcf/trunk/ (props changed) manifoldcf/trunk/CHANGES.txt manifoldcf/trunk/build.xml manifoldcf/trunk/connectors/dropbox/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/dropbox/DropboxRepositoryConnector.java manifoldcf/trunk/connectors/dropbox/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/dropbox/DropboxSession.java manifoldcf/trunk/connectors/pom.xml manifoldcf/trunk/connectors/wiki/ (props changed) manifoldcf/trunk/connectors/wiki/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/wiki/WikiConnector.java manifoldcf/trunk/dist-license/LICENSE.txt manifoldcf/trunk/framework/core/src/main/java/org/apache/manifoldcf/core/common/XThreadInputStream.java manifoldcf/trunk/lib-license/LICENSE.txt manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml Propchange: manifoldcf/trunk/ ------------------------------------------------------------------------------ Merged /manifoldcf/branches/CONNECTORS-694:r1488166-1488663 Modified: manifoldcf/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/manifoldcf/trunk/CHANGES.txt?rev=1488664&r1=1488663&r2=1488664&view=diff ============================================================================== --- manifoldcf/trunk/CHANGES.txt (original) +++ manifoldcf/trunk/CHANGES.txt Sun Jun 2 11:23:28 2013 @@ -3,6 +3,9 @@ $Id$ ======================= 1.3-dev ===================== +CONNECTORS-694: Add Google Drive connector. +(Andrew Janowczyk, Karl Wright) + CONNECTORS-690: For ElasticSearch connector, include _name and _content_type field within "file" portion of JSON, so it will work properly with the Mapper Attachment Plugin. Modified: manifoldcf/trunk/build.xml URL: http://svn.apache.org/viewvc/manifoldcf/trunk/build.xml?rev=1488664&r1=1488663&r2=1488664&view=diff ============================================================================== --- manifoldcf/trunk/build.xml (original) +++ manifoldcf/trunk/build.xml Sun Jun 2 11:23:28 2013 @@ -59,6 +59,7 @@ + @@ -110,6 +111,7 @@ + @@ -281,6 +283,8 @@ + + @@ -302,10 +306,21 @@ + + + + + + + + + + + @@ -1395,6 +1410,29 @@ + + + + + + + + + + + + + + + + + + + + + + + @@ -1437,6 +1475,24 @@ + + + + + + + + + + + + + + + + + + @@ -2590,8 +2646,8 @@ - - + + @@ -3599,6 +3655,54 @@ Use Apache Forrest version forrest-0.9-d + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -3652,7 +3756,7 @@ Use Apache Forrest version forrest-0.9-d - + @@ -3682,6 +3786,7 @@ Use Apache Forrest version forrest-0.9-d + @@ -3719,6 +3824,7 @@ Use Apache Forrest version forrest-0.9-d + Modified: manifoldcf/trunk/connectors/dropbox/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/dropbox/DropboxRepositoryConnector.java URL: http://svn.apache.org/viewvc/manifoldcf/trunk/connectors/dropbox/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/dropbox/DropboxRepositoryConnector.java?rev=1488664&r1=1488663&r2=1488664&view=diff ============================================================================== --- manifoldcf/trunk/connectors/dropbox/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/dropbox/DropboxRepositoryConnector.java (original) +++ manifoldcf/trunk/connectors/dropbox/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/dropbox/DropboxRepositoryConnector.java Sun Jun 2 11:23:28 2013 @@ -17,12 +17,10 @@ * limitations under the License. */ -/* - * To change this template, choose Tools | Templates - * and open the template in the editor. - */ package org.apache.manifoldcf.crawler.connectors.dropbox; +import org.apache.manifoldcf.core.common.*; + import com.dropbox.client2.DropboxAPI; import com.dropbox.client2.exception.DropboxException; import java.io.IOException; @@ -674,20 +672,24 @@ public class DropboxRepositoryConnector i++; } - HashSet seeds = getSeeds(dropboxPath); - for (String seed : seeds) { - activities.addSeedDocument(seed); - } - - } - - protected HashSet getSeeds(String path) - throws ManifoldCFException, ServiceInterruption { getSession(); - GetSeedsThread t = new GetSeedsThread(path); + XThreadStringBuffer seedBuffer = new XThreadStringBuffer(); + GetSeedsThread t = new GetSeedsThread(dropboxPath, seedBuffer); try { t.start(); + + // Pick up the paths, and add them to the activities, before we join with the child thread. + while (true) { + // The only kind of exceptions this can throw are going to shut the process down. + String docPath = seedBuffer.fetch(); + if (docPath == null) + break; + // Add the pageID to the queue + activities.addSeedDocument(docPath); + } + t.join(); + Throwable thr = t.getException(); if (thr != null) { if (thr instanceof DropboxException) { @@ -705,35 +707,34 @@ public class DropboxRepositoryConnector } catch (DropboxException e) { Logging.connectors.error("DROPBOX: Error adding seed documents: " + e.getMessage(), e); handleDropboxException(e); + } finally { + // Make SURE buffer is dead, otherwise child thread may well hang waiting on it + seedBuffer.abandon(); } - return t.getResponse(); } protected class GetSeedsThread extends Thread { protected Throwable exception = null; - protected HashSet response = null; - protected String path = null; + protected final String path; + protected final XThreadStringBuffer seedBuffer; - public GetSeedsThread(String path) { + public GetSeedsThread(String path, XThreadStringBuffer seedBuffer) { super(); - this.path=path; + this.path = path; + this.seedBuffer = seedBuffer; setDaemon(true); } @Override public void run() { try { - response = session.getSeeds(path,25000); //upper limit on files to get supported by dropbox api in a single directory + session.getSeeds(seedBuffer,path,25000); //upper limit on files to get supported by dropbox api in a single directory } catch (Throwable e) { this.exception = e; } } - public HashSet getResponse() { - return response; - } - public Throwable getException() { return exception; } Modified: manifoldcf/trunk/connectors/dropbox/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/dropbox/DropboxSession.java URL: http://svn.apache.org/viewvc/manifoldcf/trunk/connectors/dropbox/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/dropbox/DropboxSession.java?rev=1488664&r1=1488663&r2=1488664&view=diff ============================================================================== --- manifoldcf/trunk/connectors/dropbox/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/dropbox/DropboxSession.java (original) +++ manifoldcf/trunk/connectors/dropbox/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/dropbox/DropboxSession.java Sun Jun 2 11:23:28 2013 @@ -23,6 +23,8 @@ */ package org.apache.manifoldcf.crawler.connectors.dropbox; +import org.apache.manifoldcf.core.common.*; + import com.dropbox.client2.session.AppKeyPair; import java.util.Map; import com.dropbox.client2.session.WebAuthSession; @@ -73,23 +75,22 @@ public class DropboxSession { return info; } - public HashSet getSeeds(String path, int max_dirs) throws DropboxException { - HashSet ids = new HashSet(); + public void getSeeds(XThreadStringBuffer idBuffer, String path, int max_dirs) + throws DropboxException, InterruptedException { - ids.add(path); //need to add root dir so that single files such as /file1 will still get read + idBuffer.add(path); //need to add root dir so that single files such as /file1 will still get read - DropboxAPI.Entry root_entry = client.metadata(path, max_dirs, null, true, null); - List entries = root_entry.contents; //gets a list of the contents of the entire folder: subfolders + files + DropboxAPI.Entry root_entry = client.metadata(path, max_dirs, null, true, null); + List entries = root_entry.contents; //gets a list of the contents of the entire folder: subfolders + files - // Apply the entries one by one. - for (DropboxAPI.Entry e : entries) { - if (e.isDir) { //only add the directories as seeds, we'll add the files later - ids.add(e.path); - } - } - return ids; + // Apply the entries one by one. + for (DropboxAPI.Entry e : entries) { + if (e.isDir) { //only add the directories as seeds, we'll add the files later + idBuffer.add(e.path); + } } + } public DropboxAPI.Entry getObject(String id) throws DropboxException { return client.metadata(id, 25000, null, true, null); Modified: manifoldcf/trunk/connectors/pom.xml URL: http://svn.apache.org/viewvc/manifoldcf/trunk/connectors/pom.xml?rev=1488664&r1=1488663&r2=1488664&view=diff ============================================================================== --- manifoldcf/trunk/connectors/pom.xml (original) +++ manifoldcf/trunk/connectors/pom.xml Sun Jun 2 11:23:28 2013 @@ -51,6 +51,7 @@ alfresco elasticsearch dropbox + googledrive Propchange: manifoldcf/trunk/connectors/wiki/ ------------------------------------------------------------------------------ Merged /manifoldcf/branches/CONNECTORS-694/connectors/wiki:r1488166-1488663 Modified: manifoldcf/trunk/connectors/wiki/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/wiki/WikiConnector.java URL: http://svn.apache.org/viewvc/manifoldcf/trunk/connectors/wiki/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/wiki/WikiConnector.java?rev=1488664&r1=1488663&r2=1488664&view=diff ============================================================================== --- manifoldcf/trunk/connectors/wiki/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/wiki/WikiConnector.java (original) +++ manifoldcf/trunk/connectors/wiki/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/wiki/WikiConnector.java Sun Jun 2 11:23:28 2013 @@ -2145,7 +2145,7 @@ public class WikiConnector extends org.a try { HttpRequestBase executeMethod = getInitializedGetMethod(getListPagesURL(startPageTitle,namespace,prefix)); - PageBuffer pageBuffer = new PageBuffer(); + XThreadStringBuffer pageBuffer = new XThreadStringBuffer(); ExecuteListPagesThread t = new ExecuteListPagesThread(httpClient,executeMethod,pageBuffer,startPageTitle); try { @@ -2275,12 +2275,12 @@ public class WikiConnector extends org.a protected HttpClient client; protected HttpRequestBase executeMethod; protected Throwable exception = null; - protected PageBuffer pageBuffer; + protected XThreadStringBuffer pageBuffer; protected String lastPageTitle = null; protected String startPageTitle; protected boolean loginNeeded = false; - public ExecuteListPagesThread(HttpClient client, HttpRequestBase executeMethod, PageBuffer pageBuffer, String startPageTitle) + public ExecuteListPagesThread(HttpClient client, HttpRequestBase executeMethod, XThreadStringBuffer pageBuffer, String startPageTitle) { super(); setDaemon(true); @@ -2361,7 +2361,7 @@ public class WikiConnector extends org.a * * */ - protected static boolean parseListPagesResponse(InputStream is, PageBuffer buffer, String startPageTitle, ReturnString lastTitle) + protected static boolean parseListPagesResponse(InputStream is, XThreadStringBuffer buffer, String startPageTitle, ReturnString lastTitle) throws ManifoldCFException, ServiceInterruption { // Parse the document. This will cause various things to occur, within the instantiated XMLContext class. @@ -2393,11 +2393,11 @@ public class WikiConnector extends org.a protected static class WikiListPagesAPIContext extends SingleLevelContext { protected String lastTitle = null; - protected PageBuffer buffer; + protected XThreadStringBuffer buffer; protected String startPageTitle; protected boolean loginNeeded = false; - public WikiListPagesAPIContext(XMLStream theStream, PageBuffer buffer, String startPageTitle) + public WikiListPagesAPIContext(XMLStream theStream, XThreadStringBuffer buffer, String startPageTitle) { super(theStream,"api"); this.buffer = buffer; @@ -2434,11 +2434,11 @@ public class WikiConnector extends org.a protected static class WikiListPagesQueryContext extends SingleLevelErrorContext { protected String lastTitle = null; - protected PageBuffer buffer; + protected XThreadStringBuffer buffer; protected String startPageTitle; public WikiListPagesQueryContext(XMLStream theStream, String namespaceURI, String localName, String qName, Attributes atts, - PageBuffer buffer, String startPageTitle) + XThreadStringBuffer buffer, String startPageTitle) { super(theStream,namespaceURI,localName,qName,atts,"query"); this.buffer = buffer; @@ -2469,11 +2469,11 @@ public class WikiConnector extends org.a protected static class WikiListPagesAllPagesContext extends SingleLevelContext { protected String lastTitle = null; - protected PageBuffer buffer; + protected XThreadStringBuffer buffer; protected String startPageTitle; public WikiListPagesAllPagesContext(XMLStream theStream, String namespaceURI, String localName, String qName, Attributes atts, - PageBuffer buffer, String startPageTitle) + XThreadStringBuffer buffer, String startPageTitle) { super(theStream,namespaceURI,localName,qName,atts,"allpages"); this.buffer = buffer; @@ -2506,11 +2506,11 @@ public class WikiConnector extends org.a protected static class WikiListPagesPContext extends BaseProcessingContext { protected String lastTitle = null; - protected PageBuffer buffer; + protected XThreadStringBuffer buffer; protected String startPageTitle; public WikiListPagesPContext(XMLStream theStream, String namespaceURI, String localName, String qName, Attributes atts, - PageBuffer buffer, String startPageTitle) + XThreadStringBuffer buffer, String startPageTitle) { super(theStream,namespaceURI,localName,qName,atts); this.buffer = buffer; Modified: manifoldcf/trunk/dist-license/LICENSE.txt URL: http://svn.apache.org/viewvc/manifoldcf/trunk/dist-license/LICENSE.txt?rev=1488664&r1=1488663&r2=1488664&view=diff ============================================================================== --- manifoldcf/trunk/dist-license/LICENSE.txt (original) +++ manifoldcf/trunk/dist-license/LICENSE.txt Sun Jun 2 11:23:28 2013 @@ -299,6 +299,27 @@ License: MIT license (http://opensource. This product includes a json-simple-1.1.jar. License: Apache 2 (http://www.apache.org/licenses/LICENSE-2.0.txt) +This product includes a jackson-core-2.1.3.jar. +License: Dual license; we choose to distribute under Apache 2 (http://www.apache.org/licenses/LICENSE-2.0.txt) + +This product includes a google-api-client-1.14.1-beta.jar. +License: Apache 2 (http://www.apache.org/licenses/LICENSE-2.0.txt) + +This product includes a google-oauth-client-1.14.1-beta.jar. +License: Apache 2 (http://www.apache.org/licenses/LICENSE-2.0.txt) + +This product includes a google-api-services-drive-v2-rev64-1.14.1-beta.jar. +License: Apache 2 (http://www.apache.org/licenses/LICENSE-2.0.txt) + +This product includes a google-api-services-drive-v2-rev64-1.14.1-beta.jar. +License: Apache 2 (http://www.apache.org/licenses/LICENSE-2.0.txt) + +This product includes a google-http-client-1.14.1-beta.jar. +License: Apache 2 (http://www.apache.org/licenses/LICENSE-2.0.txt) + +This product includes a google-http-client-jackson2-1.14.1-beta.jar. +License: Apache 2 (http://www.apache.org/licenses/LICENSE-2.0.txt) + This product may include pdf files that embed IPA-licensed fonts. License: IPA Font License Agreement v1.0 (http://ossipedia.ipa.go.jp/ipafont/index.html#LicenseEng) Modified: manifoldcf/trunk/framework/core/src/main/java/org/apache/manifoldcf/core/common/XThreadInputStream.java URL: http://svn.apache.org/viewvc/manifoldcf/trunk/framework/core/src/main/java/org/apache/manifoldcf/core/common/XThreadInputStream.java?rev=1488664&r1=1488663&r2=1488664&view=diff ============================================================================== --- manifoldcf/trunk/framework/core/src/main/java/org/apache/manifoldcf/core/common/XThreadInputStream.java (original) +++ manifoldcf/trunk/framework/core/src/main/java/org/apache/manifoldcf/core/common/XThreadInputStream.java Sun Jun 2 11:23:28 2013 @@ -26,20 +26,27 @@ import java.io.*; */ public class XThreadInputStream extends InputStream { - private byte[] buffer = new byte[65536]; + private final byte[] buffer = new byte[65536]; private int startPoint = 0; private int byteCount = 0; private boolean streamEnd = false; private IOException failureException = null; - private InputStream sourceStream; private boolean abort = false; - - /** Constructor */ + + private final InputStream sourceStream; + + /** Constructor, from a given input stream. */ public XThreadInputStream(InputStream sourceStream) { this.sourceStream = sourceStream; } + /** Constructor, from another source. */ + public XThreadInputStream() + { + this.sourceStream = null; + } + /** Call this method to abort the stuffQueue() method. */ public void abort() @@ -51,8 +58,64 @@ public class XThreadInputStream extends } } + /** This method is called from the helper thread side, to stuff bytes onto + * the queue when there is no input stream. + * It exits only when interrupted or done. + */ + public void stuffQueue(byte[] byteBuffer, int offset, int amount) + throws InterruptedException + { + while (amount > 0) + { + int maxToRead; + int readStartPoint; + synchronized (this) + { + if (abort || streamEnd) + return; + // Calculate amount to read + maxToRead = buffer.length - byteCount; + if (maxToRead == 0) + { + wait(); + continue; + } + readStartPoint = (startPoint + byteCount) & (buffer.length-1); + } + if (readStartPoint + maxToRead >= buffer.length) + maxToRead = buffer.length - readStartPoint; + // Now, copy to buffer + int amt; + if (amount > maxToRead) + amt = maxToRead; + else + amt = amount; + //??? make sure this is source -> target + System.arraycopy(byteBuffer,offset,buffer,readStartPoint,amt); + offset += amt; + amount -= amt; + synchronized (this) + { + byteCount += amt; + notifyAll(); + } + } + } + + /** Call this method when there is no more data to write. + */ + public void doneStuffingQueue() + { + synchronized (this) + { + streamEnd = true; + notifyAll(); + } + } + /** This method is called from the helper thread side, to keep the queue - * stuffed. It exits when the stream is empty, or when interrupted. + * stuffed from the input stream. + * It exits when the stream is empty, or when interrupted. */ public void stuffQueue() throws IOException, InterruptedException Modified: manifoldcf/trunk/lib-license/LICENSE.txt URL: http://svn.apache.org/viewvc/manifoldcf/trunk/lib-license/LICENSE.txt?rev=1488664&r1=1488663&r2=1488664&view=diff ============================================================================== --- manifoldcf/trunk/lib-license/LICENSE.txt (original) +++ manifoldcf/trunk/lib-license/LICENSE.txt Sun Jun 2 11:23:28 2013 @@ -293,6 +293,33 @@ License: Common Development and Distribu This product includes a jstl-impl-1.2.jar. License: Common Development and Distribution License (CDDL) v1.0 (https://glassfish.dev.java.net/public/CDDLv1.0.html) +This product includes a dropbox-client-1.5.3.jar. +License: MIT license (http://opensource.org/licenses/MIT). + +This product includes a json-simple-1.1.jar. +License: Apache 2 (http://www.apache.org/licenses/LICENSE-2.0.txt) + +This product includes a jackson-core-2.1.3.jar. +License: Dual license; we choose to distribute under Apache 2 (http://www.apache.org/licenses/LICENSE-2.0.txt) + +This product includes a google-api-client-1.14.1-beta.jar. +License: Apache 2 (http://www.apache.org/licenses/LICENSE-2.0.txt) + +This product includes a google-oauth-client-1.14.1-beta.jar. +License: Apache 2 (http://www.apache.org/licenses/LICENSE-2.0.txt) + +This product includes a google-api-services-drive-v2-rev64-1.14.1-beta.jar. +License: Apache 2 (http://www.apache.org/licenses/LICENSE-2.0.txt) + +This product includes a google-api-services-drive-v2-rev64-1.14.1-beta.jar. +License: Apache 2 (http://www.apache.org/licenses/LICENSE-2.0.txt) + +This product includes a google-http-client-1.14.1-beta.jar. +License: Apache 2 (http://www.apache.org/licenses/LICENSE-2.0.txt) + +This product includes a google-http-client-jackson2-1.14.1-beta.jar. +License: Apache 2 (http://www.apache.org/licenses/LICENSE-2.0.txt) + This product may include pdf files that embed IPA-licensed fonts. License: IPA Font License Agreement v1.0 (http://ossipedia.ipa.go.jp/ipafont/index.html#LicenseEng) Modified: manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml URL: http://svn.apache.org/viewvc/manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml?rev=1488664&r1=1488663&r2=1488664&view=diff ============================================================================== --- manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml (original) +++ manifoldcf/trunk/site/src/documentation/content/xdocs/en_US/end-user-documentation.xml Sun Jun 2 11:23:28 2013 @@ -1708,8 +1708,64 @@ curl -XGET http://localhost:9200/index/_


- -
+ +
+ Google Drive Repository Connection +

The Google Drive Repository Connection type allows you to index content from Google Drives.

+

Each Google Drive Connection manages access to a single drive repository. This means that if you have multiple Google Drives (i.e. different users), + you need to create a specific connection for each drive repository and provide the associated authentication information.

+
+

A Google Drive connection has the following configuration parameters on the repository connection editing screen:

+

+
+

+

As we can see there are 3 pieces of information which are needed to create a succesful connection. The Client ID and Client Secret given by Google Drive + when you register your application for a development license. This is typically done through the Google APIs Console.

+

+
+

+

Once having created a project, we must enable the Google Drive API

+

+
+

+

Then going to the API Access link on the righthand side, we need to select create an OAutg 2.0 client ID:

+

+
+

+ +

After filling in the necessary information, we need to select what type of application we'd like. For our purposes we need to select installed application

+

+
+

+ +

Afterwards we're presented with our Client ID and Client secrets needed for the connector(where the red boxes are):

+

+
+

+ +

Now each user must confirm their acceptance of allowing your application to access their google drive. This is done through a run-of-the-mill OAUTH + approach, but needs to be done before hand. Once the steps are completed, a long-life refresh token is presented, which is then used by the connector. For completeness, we present the needed steps below since they require some manual work.

+

+

    +
  1. Browse to here: https://accounts.google.com/o/oauth2/auth?scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.readonly&state=%2Fprofile&redirect_uri=https%3A%2F%2Flocalhost&response_type=code&client_id=CLIENT_ID&approval_prompt=force&access_type=offline +
  2. This returns a link (after acceptance) https://localhost/?state=/profile&code=CODE +
  3. Perform a POST: https://accounts.google.com/o/oauth2/token with the following as the body: grant_type=authorization_code&redirect_uri=https%3A%2F%2Flocalhost&client_secret=CLIENT_SECRET&client_id=CLIENT_ID&code=CODE +
  4. The response is then a json response which contains the refresh_token. +
+

+

+

After you click the "Save" button, you will see a connection summary screen, which might look something like this:

+

+
+

+

When you configure a job to use the Google Drive repository connection an additional tab is presented. This is the "Google Drive Seed Query" tab:

+

+
+

+

This tab allows you to specify the query which will be used to seed documents for the indexing process. The query language is specified on the Drive Search Paramters site. Directories which meet the seed query are fully crawled as the query on applies to seeds. The default query indexes the entire drive. Lastly, native Google documents such as spreadsheets and word documents are exported to PDF and then ingested.

+
+ +
OpenText LiveLink Repository Connection

The LiveLink connection type allows you to index content from LiveLink repositories. LiveLink has a rich variety of different document types and metadata, which include basic documents, as well as compound documents, folders, workspaces, and projects. A LiveLink connection is able to discover documents