manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Getting a 401 Unauthorized on a SharePoint 2010 crawl request, with MCPermissions.asmx installed
Date Wed, 18 Sep 2013 18:56:21 GMT
I forgot to mention: I removed the "4.0 AWS" selection.  Select just plain
4.0 instead.

Karl



On Wed, Sep 18, 2013 at 2:06 PM, Karl Wright <daddywri@gmail.com> wrote:

> Thanks.
>
> I committed a better fix.  You will need a clean job again though if you
> want to try it.
>
> Karl
>
>
>
> On Wed, Sep 18, 2013 at 1:30 PM, Dmitry Goldenberg <dgoldenberg@kmwllc.com
> > wrote:
>
>> Karl,
>>
>> Attaching the full log.
>>
>> - Dmitry
>>
>>
>> On Wed, Sep 18, 2013 at 1:15 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Ok - is there a "Checking whether to include library" message in the
>>> log?  If so, can you send that to me?
>>>
>>> Karl
>>>
>>>
>>> On Wed, Sep 18, 2013 at 1:02 PM, Dmitry Goldenberg <
>>> dgoldenberg@kmwllc.com> wrote:
>>>
>>>> Hi Karl,
>>>>
>>>> I'm definitely seeing this issue, after a full 'rejig' of the system:
>>>> svn up, ant clean (actually blew away dist/example), ant build, re-created
>>>> the connectors and and job.  Still seeing those string index out of bounds
>>>> exceptions.
>>>>
>>>> - Dmitry
>>>>
>>>>
>>>> On Wed, Sep 18, 2013 at 12:15 PM, Karl Wright <daddywri@gmail.com>wrote:
>>>>
>>>>> Hi Dmitry,
>>>>>
>>>>> I think this is the same bug I fixed earlier today.  I think you just
>>>>> have a job around from before the code change that fixed it.  If you can
>>>>> create a new job and run that, see if you get the same issue.
>>>>>
>>>>> I'll be able to explore this more thoroughly when I get home tonight;
>>>>> from here I cannot see your instance due to firewall.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Sep 18, 2013 at 12:01 PM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>
>>>>>> Not a regression; a bug I introduced.  Let me look at it - should be
>>>>>> fixable shortly.
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Wed, Sep 18, 2013 at 11:48 AM, Dmitry Goldenberg <
>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>
>>>>>>> Hi Karl,
>>>>>>>
>>>>>>> I've just re-tested using the latest. I wonder if there's a
>>>>>>> regression issue. Just crawling /Shared Documents of the root site, I'm
>>>>>>> running into what seems like an indefinite loop of retrying to crawl that
>>>>>>> directory, with the following error showing up time after time:
>>>>>>>
>>>>>>>
>>>>>>> DEBUG 2013-09-18 11:42:24,959 (Worker thread '0') - SharePoint:
>>>>>>> Getting version of '//Shared Documents/test-word-doc-1.docx'
>>>>>>>
>>>>>>> DEBUG 2013-09-18 11:42:24,959 (Worker thread '0') - SharePoint:
>>>>>>> Checking whether to include document '/Shared
>>>>>>> Documents/test-word-doc-1.docx'
>>>>>>>
>>>>>>> DEBUG 2013-09-18 11:42:24,959 (Worker thread '0') - SharePoint: File
>>>>>>> '/Shared Documents/test-word-doc-1.docx' exactly matched rule path '/Shared
>>>>>>> Documents/*'
>>>>>>>
>>>>>>> DEBUG 2013-09-18 11:42:24,959 (Worker thread '0') - SharePoint:
>>>>>>> Including file '/Shared Documents/test-word-doc-1.docx'
>>>>>>>
>>>>>>> DEBUG 2013-09-18 11:42:24,959 (Worker thread '0') - SharePoint:
>>>>>>> Finding metadata to include for document/item '/Shared
>>>>>>> Documents/test-word-doc-1.docx'.
>>>>>>>
>>>>>>> FATAL 2013-09-18 11:42:25,004 (Worker thread '0') - Error tossed:
>>>>>>> String index out of range: -1
>>>>>>>
>>>>>>> java.lang.StringIndexOutOfBoundsException: String index out of
>>>>>>> range: -1
>>>>>>>
>>>>>>> at java.lang.String.substring(String.java:1911)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.getDocumentVersions(SharePointRepository.java:926)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:322)
>>>>>>>
>>>>>>> DEBUG 2013-09-18 11:42:26,835 (Worker thread '2') - SharePoint:
>>>>>>> Getting version of '//Shared Documents/test-word-doc-1.docx'
>>>>>>>
>>>>>>> DEBUG 2013-09-18 11:42:26,835 (Worker thread '2') - SharePoint:
>>>>>>> Checking whether to include document '/Shared
>>>>>>> Documents/test-word-doc-1.docx'
>>>>>>>
>>>>>>> DEBUG 2013-09-18 11:42:26,835 (Worker thread '2') - SharePoint: File
>>>>>>> '/Shared Documents/test-word-doc-1.docx' exactly matched rule path '/Shared
>>>>>>> Documents/*'
>>>>>>>
>>>>>>> DEBUG 2013-09-18 11:42:26,835 (Worker thread '2') - SharePoint:
>>>>>>> Including file '/Shared Documents/test-word-doc-1.docx'
>>>>>>>
>>>>>>> DEBUG 2013-09-18 11:42:26,835 (Worker thread '2') - SharePoint:
>>>>>>> Finding metadata to include for document/item '/Shared
>>>>>>> Documents/test-word-doc-1.docx'.
>>>>>>>
>>>>>>> FATAL 2013-09-18 11:42:26,840 (Worker thread '2') - Error tossed:
>>>>>>> String index out of range: -1
>>>>>>>
>>>>>>> java.lang.StringIndexOutOfBoundsException: String index out of
>>>>>>> range: -1
>>>>>>>
>>>>>>> at java.lang.String.substring(String.java:1911)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.getDocumentVersions(SharePointRepository.java:926)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:322)
>>>>>>>
>>>>>>> DEBUG 2013-09-18 11:42:26,860 (Worker thread '1') - SharePoint:
>>>>>>> Getting version of '//Shared Documents/test-word-doc-1.docx'
>>>>>>>
>>>>>>> DEBUG 2013-09-18 11:42:26,860 (Worker thread '1') - SharePoint:
>>>>>>> Checking whether to include document '/Shared
>>>>>>> Documents/test-word-doc-1.docx'
>>>>>>>
>>>>>>> DEBUG 2013-09-18 11:42:26,860 (Worker thread '1') - SharePoint: File
>>>>>>> '/Shared Documents/test-word-doc-1.docx' exactly matched rule path '/Shared
>>>>>>> Documents/*'
>>>>>>>
>>>>>>> DEBUG 2013-09-18 11:42:26,860 (Worker thread '1') - SharePoint:
>>>>>>> Including file '/Shared Documents/test-word-doc-1.docx'
>>>>>>>
>>>>>>> DEBUG 2013-09-18 11:42:26,860 (Worker thread '1') - SharePoint:
>>>>>>> Finding metadata to include for document/item '/Shared
>>>>>>> Documents/test-word-doc-1.docx'.
>>>>>>>
>>>>>>> FATAL 2013-09-18 11:42:26,865 (Worker thread '1') - Error tossed:
>>>>>>> String index out of range: -1
>>>>>>>
>>>>>>> java.lang.StringIndexOutOfBoundsException: String index out of
>>>>>>> range: -1
>>>>>>>
>>>>>>> at java.lang.String.substring(String.java:1911)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.getDocumentVersions(SharePointRepository.java:926)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:322)
>>>>>>>
>>>>>>> DEBUG 2013-09-18 11:42:26,885 (Worker thread '3') - SharePoint:
>>>>>>> Getting version of '//Shared Documents/test-word-doc-1.docx'
>>>>>>>
>>>>>>> DEBUG 2013-09-18 11:42:26,885 (Worker thread '3') - SharePoint:
>>>>>>> Checking whether to include document '/Shared
>>>>>>> Documents/test-word-doc-1.docx'
>>>>>>>
>>>>>>> DEBUG 2013-09-18 11:42:26,885 (Worker thread '3') - SharePoint: File
>>>>>>> '/Shared Documents/test-word-doc-1.docx' exactly matched rule path '/Shared
>>>>>>> Documents/*'
>>>>>>>
>>>>>>> DEBUG 2013-09-18 11:42:26,885 (Worker thread '3') - SharePoint:
>>>>>>> Including file '/Shared Documents/test-word-doc-1.docx'
>>>>>>>
>>>>>>> DEBUG 2013-09-18 11:42:26,885 (Worker thread '3') - SharePoint:
>>>>>>> Finding metadata to include for document/item '/Shared
>>>>>>> Documents/test-word-doc-1.docx'.
>>>>>>>
>>>>>>> FATAL 2013-09-18 11:42:26,895 (Worker thread '3') - Error tossed:
>>>>>>> String index out of range: -1
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Sep 18, 2013 at 11:27 AM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>>>
>>>>>>>> Hi Dmitry,
>>>>>>>>
>>>>>>>> It may be worth reviewing with that engineer what steps he took
>>>>>>>> when he installed the instance.  If he used the standard installer, IIRC
>>>>>>>> there are a number of ways you can mess this up - the primary way being if
>>>>>>>> you try to install IIS afterwards and then just try to patch things up.
>>>>>>>> The canned install usually does best if IIS is installed first.
>>>>>>>>
>>>>>>>> At any rate, I think that you have a probable case of "operator
>>>>>>>> error" here...
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I can think of a few possibilities.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Sep 18, 2013 at 11:16 AM, Dmitry Goldenberg <
>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>
>>>>>>>>> SharePoint was not installed by a domain user (the Windows
>>>>>>>>> instance is not on a domain).
>>>>>>>>>
>>>>>>>>> This is not a canned AWS SharePoint installation; an engineer on
>>>>>>>>> the team installed it, using the standard installer program, I believe.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Sep 18, 2013 at 10:34 AM, Will Parkinson <
>>>>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Dmitry, do you know if Sharepoint was installed by a domain
>>>>>>>>>> user?  I have heard of issues with Sharepoint if not installed using a
>>>>>>>>>> domain user (e.g. DOMAIN\someuser)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Sep 19, 2013 at 12:31 AM, Will Parkinson <
>>>>>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> No, i didnt have that issue.  The issue i had was the // and ///
>>>>>>>>>>> references being added in the wrong places in the page URL's
>>>>>>>>>>>
>>>>>>>>>>> I was getting things like
>>>>>>>>>>>
>>>>>>>>>>>  /Site Name/Lib///rary/test.aspx
>>>>>>>>>>>
>>>>>>>>>>> My first set up was an out of the box set up, the main site was
>>>>>>>>>>> on port 80, using classic authentication.  With the path modification in
>>>>>>>>>>> the mcf-sharepoint-connector.jar, it worked very well.
>>>>>>>>>>>
>>>>>>>>>>> I set up active directory on that same server to authenticate
>>>>>>>>>>> via NTLM
>>>>>>>>>>>
>>>>>>>>>>> The second server had the site on https on port 443, had claims
>>>>>>>>>>> based authentication using ADFS and kerberos.  I had to modify the
>>>>>>>>>>> mcf-sharepoint-connector.jar and MCPermissions.wsp to get this to work
>>>>>>>>>>> around the lack of SID's returned from the permissions webservice.
>>>>>>>>>>>
>>>>>>>>>>> In this case, Active Directory and ADFS were set up on separate
>>>>>>>>>>> AWS servers
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Sep 19, 2013 at 12:23 AM, Karl Wright <
>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Will,
>>>>>>>>>>>>
>>>>>>>>>>>> The path stuff we're already dealing with - see the
>>>>>>>>>>>> CONNECTORS-772 branch.  But what we are having trouble with is something
>>>>>>>>>>>> much more fundamental.  On Dmitry's AWS instance, when you talk to the web
>>>>>>>>>>>> services for a root site, it works fine.  But as soon as you add a subsite
>>>>>>>>>>>> path into the URL, it *seems* to work fine, but actually behaves as though
>>>>>>>>>>>> you never specified any subsite at all - it returns root site information
>>>>>>>>>>>> only.  On this system, this occurs for ALL web services, even Microsoft's.
>>>>>>>>>>>> The reason is that the value of SPContext.Current.Web never points to the
>>>>>>>>>>>> subsite you specified.  The result is that you cannot use SharePoint
>>>>>>>>>>>> subsites with ManifoldCF without causing havoc.
>>>>>>>>>>>>
>>>>>>>>>>>> Does this sound completely unfamiliar to you?  If you never
>>>>>>>>>>>> encountered it, then we should compare how these instances were set up,
>>>>>>>>>>>> unless you have any further ideas.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Sep 18, 2013 at 10:12 AM, Will Parkinson <
>>>>>>>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hey Karl (and Dmitry)
>>>>>>>>>>>>>
>>>>>>>>>>>>> For AWS, i had to modify the way the the relPath in the in the
>>>>>>>>>>>>> addFile function in the FileStream class (in SharepointRepository.java)
>>>>>>>>>>>>> calculated the modifiedPath
>>>>>>>>>>>>>
>>>>>>>>>>>>> Essentially, i ensured that the relPath always contains the
>>>>>>>>>>>>> site as part of the path
>>>>>>>>>>>>>
>>>>>>>>>>>>>               if (siteName != "") {
>>>>>>>>>>>>>                     int siteInd = relPath.indexOf(siteName);
>>>>>>>>>>>>>                     if (siteInd == -1 || siteInd > 3) {
>>>>>>>>>>>>>                         relPath = siteName + relPath;
>>>>>>>>>>>>>                     }
>>>>>>>>>>>>>                 }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Which fixed my pathing issue and the index out of bounds
>>>>>>>>>>>>> errors.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have also made many other modification to cope with AD and
>>>>>>>>>>>>> claims based auth and compatibility with Sharepoint 2013
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dmitry, i have uploaded my modified
>>>>>>>>>>>>> mcf-sharepoint-connector.jar and MCPermissions WSP if you would like to try
>>>>>>>>>>>>> them out
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://pngnetworks.com/sharepoint-2010-claims.zip
>>>>>>>>>>>>>
>>>>>>>>>>>>> Just make sure you back up your current ones as this is still
>>>>>>>>>>>>> very much in development :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also, the logging is very verbose.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Will
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 11:41 PM, Karl Wright <
>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Will,
>>>>>>>>>>>>>> When you folks set up YOUR AWS instance, did it work with MCF
>>>>>>>>>>>>>> out of the box?  Or did you need to do something?  And, if so, what did you
>>>>>>>>>>>>>> do?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 9:28 AM, Will Parkinson <
>>>>>>>>>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes that's right, only really interested in the site that
>>>>>>>>>>>>>>> you are trying to crawl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 11:25 PM, Dmitry Goldenberg <
>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Will,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For SharePoint - 80, the output is
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> NTAuthenticationProviders       : (STRING) "NTLM"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I assume we're not interested in the Default Web Site; for
>>>>>>>>>>>>>>>> that, the output is simply "The parameter NTAuthenticationProviders is not
>>>>>>>>>>>>>>>> set at this node."
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 9:16 AM, Will Parkinson <
>>>>>>>>>>>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If you open IIS manager and click on sites, it is
>>>>>>>>>>>>>>>>> displayed in the ID column (see screenshot attached)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 10:55 PM, Dmitry Goldenberg <
>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> **Hi Will,
>>>>>>>>>>>>>>>>>> Sorry, what is the "sharepoint website *number*" in that
>>>>>>>>>>>>>>>>>> invokation?
>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 8:53 AM, Will Parkinson <
>>>>>>>>>>>>>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Dmitry
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Just out of interest, what does the following command
>>>>>>>>>>>>>>>>>>> output on your system
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> cd to C:\inetpub\adminscripts
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> *cscript adsutil.vbs get w3svc/<put your sharepoint
>>>>>>>>>>>>>>>>>>> website number here>/root/NTAuthenticationProviders*
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Will
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 10:44 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> "This is the second time I'm encountering the issue
>>>>>>>>>>>>>>>>>>>> which leads me to believe it's a quirk of IIS and/or SharePoint."
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> It cannot be just a quirk of SharePoint because
>>>>>>>>>>>>>>>>>>>> SharePoint's UI etc could not create or work with subsites if that was
>>>>>>>>>>>>>>>>>>>> true.  It may well be a configuration issue with IIS, which is indeed what
>>>>>>>>>>>>>>>>>>>> I suspect.  I have pinged all the resources I know of to try and get some
>>>>>>>>>>>>>>>>>>>> insight as to why this is happening.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> "Perhaps this is something that can be worked into the
>>>>>>>>>>>>>>>>>>>> 'fabric' of ManifoldCF as a workaround for a known issue."
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Like I said before, this is a huge amount of work,
>>>>>>>>>>>>>>>>>>>> tantamount to rewriting most of the connector.  If this is what you want to
>>>>>>>>>>>>>>>>>>>> request, that is your option, but there is no way we'd complete any of this
>>>>>>>>>>>>>>>>>>>> work before December/January at the earliest.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> "Just to understand this a bit better, the main
>>>>>>>>>>>>>>>>>>>> breakage here is that the wildcards don't work properly, right? "
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> No, it means that ManifoldCF cannot get at any data of
>>>>>>>>>>>>>>>>>>>> any kind associated with a SharePoint subsite.  Accessing root data works
>>>>>>>>>>>>>>>>>>>> fine.  If you try to crawl as things are now, you must disable all subsites
>>>>>>>>>>>>>>>>>>>> and just crawl the root site, or you will crawl the same things with longer
>>>>>>>>>>>>>>>>>>>> and longer paths indefinitely.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 8:38 AM, Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> This is the second time I'm encountering the issue
>>>>>>>>>>>>>>>>>>>>> which leads me to believe it's a quirk of IIS and/or SharePoint. Perhaps
>>>>>>>>>>>>>>>>>>>>> this is something that can be worked into the 'fabric' of ManifoldCF as a
>>>>>>>>>>>>>>>>>>>>> workaround for a known issue. I understand that it may have far reaching
>>>>>>>>>>>>>>>>>>>>> tenticles but I wonder if that's really the only option...
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Just to understand this a bit better, the main
>>>>>>>>>>>>>>>>>>>>> breakage here is that the wildcards don't work properly, right?  In theory
>>>>>>>>>>>>>>>>>>>>> if I have a repo connector config which lists specific library and list
>>>>>>>>>>>>>>>>>>>>> paths, things should work?  It's only when the /* types of wildcards are
>>>>>>>>>>>>>>>>>>>>> included, we're in trouble?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 8:07 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Someone else was having a similar problem. See
>>>>>>>>>>>>>>>>>>>>>> http://social.technet.microsoft.com/Forums/sharepoint/en-US/e4b53c63-b89a-4356-a7b0-6ca7bfd22826/getting-sharepoint-subsite-from-custom-webservice.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Apparently it does depend on how you get to the web
>>>>>>>>>>>>>>>>>>>>>> service, which does argue that it is an IIS issue.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Tue, Sep 17, 2013 at 5:44 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> As discussed privately I had a look at your system.
>>>>>>>>>>>>>>>>>>>>>>> What is happening is that the C# static SPContext.Current.Web is not
>>>>>>>>>>>>>>>>>>>>>>> reflecting the subsite in any url that contains a subsite.  In other words,
>>>>>>>>>>>>>>>>>>>>>>> the URL coming in might be "
>>>>>>>>>>>>>>>>>>>>>>> http://servername/subsite1/_vti_bin/MCPermissions.asmx",
>>>>>>>>>>>>>>>>>>>>>>> but the MCPermissions.asmx plugin will think it is being executed in the
>>>>>>>>>>>>>>>>>>>>>>> root context ("http://servername").  That's pretty
>>>>>>>>>>>>>>>>>>>>>>> broken behavior, so I'm guessing that the problem is that either IIS or
>>>>>>>>>>>>>>>>>>>>>>> SharePoint is somehow misconfigured to do this, and the web services would
>>>>>>>>>>>>>>>>>>>>>>> then begin to work right again.  But I have no idea how this should
>>>>>>>>>>>>>>>>>>>>>>> actually be fixed.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Will Parkinson, one of the subscribers of this list,
>>>>>>>>>>>>>>>>>>>>>>> may find the symptoms meaningful, since he set up an AWS SharePoint
>>>>>>>>>>>>>>>>>>>>>>> instance before.  I hope he will respond in a helpful way.  Until then, I
>>>>>>>>>>>>>>>>>>>>>>> think we are stuck.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Tue, Sep 17, 2013 at 9:49 AM, Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> It looks like I'll be able to get access for you to
>>>>>>>>>>>>>>>>>>>>>>>> the test system we're using. Would you be interested in working with the
>>>>>>>>>>>>>>>>>>>>>>>> system directly? I certainly don't mind doing some testing but I thought
>>>>>>>>>>>>>>>>>>>>>>>> we'd speed things up this way. If so, could you email me from a more
>>>>>>>>>>>>>>>>>>>>>>>> private account so we can set this up?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Sep 17, 2013 at 7:38 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Another interesting bit from the log:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library list: '/_catalogs/lt/Forms/AllItems.aspx', 'List
>>>>>>>>>>>>>>>>>>>>>>>>> Template Gallery'
>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library list: '/_catalogs/masterpage/Forms/AllItems.aspx',
>>>>>>>>>>>>>>>>>>>>>>>>> 'Master Page Gallery'
>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library list: '/Shared Documents/Forms/AllItems.aspx',
>>>>>>>>>>>>>>>>>>>>>>>>> 'Shared Documents'
>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library list: '/SiteAssets/Forms/AllItems.aspx', 'Site Assets'
>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library list: '/SitePages/Forms/AllPages.aspx', 'Site Pages'
>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library list: '/_catalogs/solutions/Forms/AllItems.aspx',
>>>>>>>>>>>>>>>>>>>>>>>>> 'Solution Gallery'
>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library list: '/Style Library/Forms/AllItems.aspx', 'Style
>>>>>>>>>>>>>>>>>>>>>>>>> Library'
>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library list: '/Test Library 1/Forms/AllItems.aspx', 'Test
>>>>>>>>>>>>>>>>>>>>>>>>> Library 1'
>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library list: '/_catalogs/theme/Forms/AllItems.aspx', 'Theme
>>>>>>>>>>>>>>>>>>>>>>>>> Gallery'
>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library list: '/_catalogs/wp/Forms/AllItems.aspx', 'Web Part
>>>>>>>>>>>>>>>>>>>>>>>>> Gallery'
>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Checking whether to include library
>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Shared Documents'
>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Shared
>>>>>>>>>>>>>>>>>>>>>>>>> Documents' exactly matched rule path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Including library
>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Shared Documents'
>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Checking whether to include library
>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SiteAssets'
>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SiteAssets'
>>>>>>>>>>>>>>>>>>>>>>>>> exactly matched rule path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Including library
>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SiteAssets'
>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Checking whether to include library
>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SitePages'
>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SitePages'
>>>>>>>>>>>>>>>>>>>>>>>>> exactly matched rule path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Including library
>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SitePages'
>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Checking whether to include library
>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Style Library'
>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Library '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Style
>>>>>>>>>>>>>>>>>>>>>>>>> Library' exactly matched rule path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7')
>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Including library
>>>>>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Style Library'
>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> This time it appears that it is the Lists service
>>>>>>>>>>>>>>>>>>>>>>>>> that is broken and does not recognize the parent site.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I haven't corrected this problem yet since now I
>>>>>>>>>>>>>>>>>>>>>>>>> am beginning to wonder if *any* of the web services under Amazon work at
>>>>>>>>>>>>>>>>>>>>>>>>> all for subsites.  We may be better off implementing everything we need in
>>>>>>>>>>>>>>>>>>>>>>>>> the MCPermissions service.  I will ponder this as I continue to research
>>>>>>>>>>>>>>>>>>>>>>>>> the logs.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> It's still valuable to check my getSites()
>>>>>>>>>>>>>>>>>>>>>>>>> implementation.  I'll be doing another round of work tonight on the plugin.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 8:45 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> The augmented plugin can be downloaded from
>>>>>>>>>>>>>>>>>>>>>>>>>> http://people.apache.org/~kwright/MetaCarta.SharePoint.MCPermissionsService.wsp.  The revised connector code is also ready, and should be checked out and
>>>>>>>>>>>>>>>>>>>>>>>>>> built from
>>>>>>>>>>>>>>>>>>>>>>>>>> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-772.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Once you set it all up, you can see if it is
>>>>>>>>>>>>>>>>>>>>>>>>>> doing the right thing by just trying to drill down through subsites in the
>>>>>>>>>>>>>>>>>>>>>>>>>> UI.  You should always see a list of subsites that is appropriate for the
>>>>>>>>>>>>>>>>>>>>>>>>>> context you are in; if this does not happen it is not working.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 7:45 PM, Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>> Goldenberg <dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I can see how preloading the list of subsites
>>>>>>>>>>>>>>>>>>>>>>>>>>> may be less optimal.. The advantage of doing it this way is one call and
>>>>>>>>>>>>>>>>>>>>>>>>>>> you've got the structure in memory, which may be OK unless there are sites
>>>>>>>>>>>>>>>>>>>>>>>>>>> with a ton of subsites which may stress out memory. The disadvantage is
>>>>>>>>>>>>>>>>>>>>>>>>>>> having to throw this structure around..
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, I'll certainly help test out your changes,
>>>>>>>>>>>>>>>>>>>>>>>>>>> just let me know when they're available.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 7:19 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the code snippet.  I'd prefer,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> though, to not preload the entire site structure in memory.  Probably it
>>>>>>>>>>>>>>>>>>>>>>>>>>>> would be better to just add another method to the ManifoldCF SharePoint
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2010 plugin.  More methods are going to be added anyway to support Claim
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Space Authentication, so I guess this would be just one more.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> We honestly have never seen this problem before
>>>>>>>>>>>>>>>>>>>>>>>>>>>> - so it's not just flakiness, it has something to do with the installation,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm certain.  At any rate, I'll get going right away on a workaround - if
>>>>>>>>>>>>>>>>>>>>>>>>>>>> you are willing to test what I produce.  I'm also certain there is at least
>>>>>>>>>>>>>>>>>>>>>>>>>>>> one other issue, but hopefully that will become clearer once this one is
>>>>>>>>>>>>>>>>>>>>>>>>>>>> resolved.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 6:49 PM, Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Goldenberg <dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> subsite discovery is effectively disabled
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> except directly under the root site
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes. Come to think of it, I once came across
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this problem while implementing a SharePoint connector.  I'm not sure
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> whether it's exactly what's happening with the issue we're discussing but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> looks like it.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I started off by using multiple
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> getWebCollection calls to get child subsites of sites and trying to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> navigate down that way. The problem was that getWebCollection was always
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> returning the immediate subsites of the root site no matter whether you're
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at the root or below, so I ended up generating infinite loops.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I switched over to using a single
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> getAllSubWebCollection call and caching its results. That call returns the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> full list of all subsites as pairs of Title and Url.  I had a POJO similar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to the one below which held the list of sites and contained logic for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> enumerating the child sites, given the URL of a (parent) site.  From what I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> recall, getWebCollection works inconsistently, either across SP versions or
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> across installations, but the logic below should work in any case.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *** public class SubSiteCollection -- holds a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> list of CrawledSite pojo's each of which is a { title, url }.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *** SubSiteCollection has the following:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  public List<CrawledSite>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> getImmediateSubSites(String siteUrl) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   List<CrawledSite> subSites = new
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ArrayList<CrawledSite>();
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   for (CrawledSite site : sites) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    if (isChildOf(siteUrl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> site.getUrl().toString())) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>     subSites.add(site);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   return subSites;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  private static boolean isChildOf(String
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parentUrl, String urlToCheck) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   final String parent =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> normalizeUrl(parentUrl);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   final String child =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> normalizeUrl(urlToCheck);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   boolean ret = false;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   if (child.startsWith(parent)) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    String remainder =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> child.substring(parent.length());
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    ret =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> StringUtils.countOccurrencesOf(remainder, SLASH) == 1;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   return ret;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  private static String normalizeUrl(String
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> url) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   return ((url.endsWith(SLASH)) ? url : url +
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SLASH).toLowerCase();
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 2:54 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Have a look at this sequence also:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,817 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Subsite list: '
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://ec2-99-99-99-99.compute-1.amazonaws.com/Abcd',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'Abcd'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,817 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Subsite list: '
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://ec2-99-99-99-99.compute-1.amazonaws.com/Defghij',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'Defghij'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,817 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Subsite list: '
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://ec2-99-99-99-99.compute-1.amazonaws.com/Klmnopqr',
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'Klmnopqr'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Checking whether to include site
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/Klmnopqr/Abcd/Abcd/Klmnopqr/Abcd'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Abcd' exactly matched
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rule path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Including site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Abcd'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Checking whether to include site
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/Klmnopqr/Abcd/Abcd/Klmnopqr/Defghij'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Defghij' exactly
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> matched rule path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Including site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Defghij'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Checking whether to include site
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/Klmnopqr/Abcd/Abcd/Klmnopqr/Klmnopqr'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Klmnopqr' exactly
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> matched rule path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '8') - SharePoint: Including site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Klmnopqr'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is using the GetSites(String parent)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method with a site name of "/Klmnopqr/Abcd/Abcd/Klmnopqr", and getting back
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> three sites (!!).  The parent path is not correct, obviously, but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> nevertheless this one way in which paths are getting completely messed up.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It *looks* like the Webs web service is broken in such a way as to ignore
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the URL coming in, except for the base part, which means that subsite
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discovery is effectively disabled except directly under the root site.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This might still be OK if it is not possible
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to create subsites of subsites in this version of SharePoint.  Can you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> confirm that this is or is not possible?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 2:42 PM, Karl Wright
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "This is everything that got generated, from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the very beginning"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Well, something isn't right.  What I expect
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to see that I don't right up front are:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - A webs "getWebCollection" invocation for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /_vti_bin/webs.asmx
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Two lists "getListCollection" invocations
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for /_vti_bin/lists.asmx
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Instead the first transactions I see are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from already busted URLs - which make no sense since there would be no way
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> they should have been able to get queued yet.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So there are a number of possibilities.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> First, maybe the log isn't getting cleared out, and the session in question
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> therefore starts somewhere in the middle of manifoldcf.log.1.  But no:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> C:\logs>grep "POST /_vti_bin/webs"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifoldcf.log.1
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> grep: input lines truncated - result
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> questionable
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Nevertheless there are some interesting
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> points here.  First, note the following response, which I've been able to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> determine is against "Test Library 1":
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 13:02:31,590 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '23') - SharePoint: getListItems xml response: '<GetListItems xmlns="
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> xmlns=""><GetListItemsResult
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FileRef="SitePages/Home.aspx"/></GetListItemsResponse></GetListItems>'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 13:02:31,590 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '23') - SharePoint: Checking whether to include document
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/SitePages/Home.aspx'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 13:02:31,590 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '23') - SharePoint: File '/SitePages/Home.aspx' exactly matched rule path
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 13:02:31,590 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '23') - SharePoint: Including file '/SitePages/Home.aspx'
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  WARN 2013-09-16 13:02:31,590 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '23') - Sharepoint: Unexpected relPath structure; path is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> '/SitePages/Home.aspx', but expected <list/library> length of 26
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The FileRef in this case is pointing at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> what, exactly?  Is there a SitePages/Home.aspx in the "Test Library 1"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> library?  Or does it mean to refer back to the root site with this URL
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> construction?  And since this is supposedly at the root level, how come the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> combined site + library name comes out to 26??  I get 15, which leaves 11
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> characters unaccounted for.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm still looking at the logs to see if I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> can glean key information.  Later, if I could set up a crawl against the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sharepoint instance in question, that would certainly help.  I can readily
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> set up an ssh tunnel if that is what is required.  But I won't be able to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do it until I get home tonight.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 1:58 PM, Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Goldenberg <dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is everything that got generated, from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the very beginning, meaning that I did a fresh build, new database, new
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connection definitions, start. The log must have rolled but the .1 log is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> included.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If I were to get you access to the actual
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> test system, would you mind taking a look? It may be more efficient than
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sending logs..
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 1:48 PM, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> These logs are different but have exactly
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the same problem; they start in the middle when the crawl is already well
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> underway.  I'm wondering if by chance you have more than one agents process
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> running or something?  Or maybe the log is rolling and stuff is getting
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lost?  What's there is not what I would expect to see, at all.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I *did* manage to find two transactions
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that look like they might be helpful, but because the *results* of those
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> transactions are required by transactions that take place minutes *before*
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the log, I have no confidence that I'm looking at anything meaningful.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But I'll get back to you on what I find nonetheless.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you decide repeat this exercise, try
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> watching the log with "tail -f" before starting the job.  You should not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> see any log contents at all until the job is started.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 1:11 PM, Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Goldenberg <dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Attached please find logs which start at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the beginning. I started from a fresh build (clean db etc.), the logs start
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> at server start, then I create the output connection and the repo
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connection, then the job, and then I fire off the job. I aborted the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> execution about a minute into it or so.  That's all that's in the logs with:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> org.apache.manifoldcf.connectors=DEBUG
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> log4j.logger.httpclient.wire.header=DEBUG
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> log4j.logger.org.apache.commons.httpclient=DEBUG
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 12:39 PM, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Are you sure these are the right logs?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They start right in the middle of a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crawl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They are already in a broken state
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> when they start, e.g. the kinds of things that are being looked up are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> already nonsense paths
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I need to see logs from the BEGINNING of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a fresh crawl to see how the nonsense paths happen.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 11:52 AM, Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Goldenberg <dgoldenberg@kmwllc.com>wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've generated logs with details as we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussed.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The job was created afresh, as before:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Path rules:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* file include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* library include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* list include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* site include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Metadata:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* include true
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The logs are attached.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 11:20 AM, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "Do you think that this issue is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> generic with regard to any Amz instance?"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I presume so, since you didn't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> apparently do anything special to set one of these up.  Unfortunately, such
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> instances are not part of the free tier, so I am still constrained from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> setting one up for myself because of household rules here.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "For now, I assume our only workaround
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is to list the paths of interest manually"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Depending on what is going wrong, that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> may not even work.  It looks like several SharePoint web service calls may
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be affected, and not in a cleanly predictable way, for this to happen.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "is identification and extraction of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> attachments supported in the SP connector?"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF in general leaves
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> identification and extraction to the search engine.  Solr, for instance
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses Tika for this, if so configured.  You can configure your Solr output
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connection to include or exclude specific mime types or extensions if you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> want to limit what is attempted.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 11:09 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, Karl. Do you think that this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue is generic with regard to any Amz instance? I'm just wondering how
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> easily reproducible this may be..
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For now, I assume our only workaround
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is to list the paths of interest manually, i.e. add explicit rules for each
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> library and list.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> A related subject - is identification
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and extraction of attachments supported in the SP connector?  E.g. if I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> have a Word doc attached to a Task list item, would that be extracted?  So
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> far, I see that library content gets crawled and I'm getting the list item
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data but am not sure what happens to the attachments.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 10:48 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <daddywri@gmail.com>wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the additional
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> information.  It does appear like the method that lists subsites is not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> working as expected under AWS.  Nor are some number of other methods which
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> supposedly just list the children of a subsite.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've reopened CONNECTORS-772 to work
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on addressing this issue.  Please stay tuned.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 10:08 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Most of the paths that get
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> generated are listed in the attached log, they match what shows up in the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> diag report. So I'm not sure where they diverge, most of them just don't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> seem right.  There are 3 subsites rooted in the main site: Abcd, Defghij,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Klmnopqr.  It's strange that the connector would try such paths as:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /*Klmnopqr*/*Defghij*/*Defghij*/Announcements///
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- there are multiple repetitions of the same subsite on the path and to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> begin with, Defghij is not a subsite of Klmnopqr, so why would it try
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this? the /// at the end doesn't seem correct either, unless I'm missing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> something in how this pathing works.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /Test Library
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1/Financia/lProjectionsTemplate.xl/Abcd/Announcements -- looks wrong. A
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> docname is mixed into the path, a subsite ends up after a docname?...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /Shared
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Documents/Personal_Fina/ncial_Statement_1_1.xl/Defghij/ -- same types of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues plus now somehow the docname got split with a forward slash?..
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There are also a bunch of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> StringIndexOutOfBoundsException's.  Perhaps this logic doesn't fit with the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pathing we're seeing on this amz-based installation?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd expect the logic to just know
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that root contains 3 subsites, and work off that. Each subsite has a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specific list of libraries and lists, etc. It seems odd that the connector
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> gets into this matching pattern, and tries what looks like thousands of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> variations (I aborted the execution).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 7:56 AM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <daddywri@gmail.com>wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To clarify, the way you would need
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to analyze this is to run a crawl with the wildcards as you have selected,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abort if necessary after a while, and then use the Document Status report
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to list the document identifiers that had been generated.  Find a document
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> identifier that you believe represents a path that is illegal, and figure
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> out what SOAP getChild call caused the problem by returning incorrect
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data.  In other words, find the point in the path where the path diverges
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from what exists into what doesn't exist, and go back in the ManifoldCF
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logs to find the particular SOAP request that led to the issue.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd expect from your description
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that the problem lies with getting child sites given a site path, but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that's just a guess at this point.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Sep 15, 2013 at 6:40 PM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl Wright <daddywri@gmail.com>wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't understand what you mean
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> by "I've tried the set of wildcards as below and I seem to be running into
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a lot of cycles, where various subsite folders are appended to each other
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and an extraction of data at all of those locations is attempted".   If you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are seeing cycles it means that document discovery is still failing in some
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> way.  For each folder/library/site/subsite, only the children of that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> folder/library/site/subsite should be appended to the path - ever.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you can give a specific
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> example, preferably including the soap back-and-forth, that would be very
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> helpful.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Sep 15, 2013 at 1:40 PM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Quick question. Is there an easy
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> way to configure an SP repo connection for crawling of all content, from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the root site all the way down?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've tried the set of wildcards
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> as below and I seem to be running into a lot of cycles, where various
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> subsite folders are appended to each other and an extraction of data at all
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of those locations is attempted. Ideally I'd like to avoid having to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> construct an exact set of paths because the set may change, especially with
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> new content being added.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Path rules:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* file include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* library include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* list include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* site include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Metadata:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* include true
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd also like to pull down any
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> files attached to list items. I'm hoping that some type of "/* file
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> include" should do it, once I figure out how to safely include all content.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message