manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Getting a 401 Unauthorized on a SharePoint 2010 crawl request, with MCPermissions.asmx installed
Date Wed, 18 Sep 2013 16:20:53 GMT
Hi Dmitry,

There's also a share deny acl that you are missing.

Since you are creating your own security model, I can't guarantee that what
I tell you will work properly.  But as I have said before, the process is:

- for a given user, get the access tokens
- get the share grant/deny tokens, and match against those; if the user is
not granted access, then don't include the user name or SID in the list
- get the file grant/deny tokens, and match against those.  If the user is
not granted access, then don't include the user name or SID in the list
- If both passed, include the user name or SID in the list

Karl





On Wed, Sep 18, 2013 at 12:00 PM, Dmitry Goldenberg
<dgoldenberg@kmwllc.com>wrote:

> Karl,
>
> I just wanted to clarify something, regarding the getACL() and
> getShareACL() methods:
>
> "Match the document's "share" access tokens with the user access tokens.
> If any "deny" tokens match, the user does not get to see the document. If
> any "grant" tokens match, then go to the next step. If the list of document
> "share" tokens is empty, then also you go to the next step. Do the same
> thing for the document's "file" access tokens."
>
> I'd like to determine the list of SID's that can read a document, let's
> say, I want to do it at ingestion time. I have a RepositoryDocument "in
> hand", and I have the following:
>
> String[] acl = repoDoc.getACL();
> String[] shareAcl = repoDoc.getShareACL();
> String[] denyAcl = repoDoc.getDenyACL();
>
> I'm good with denyAcl.  But I want to understand how to merge acl and
> shareAcl to yield the list of SID's with read access. Do I just merge the
> two lists? Or find their intersection?
>
> Thanks,
> - Dmitry
>
>
> On Wed, Sep 18, 2013 at 11:47 AM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Dmitry,
>>
>> I understand you worked around the issue once before in the past.  But
>> please bear in mind that we have probably a dozen active SharePoint 2010
>> MCF installations where this problem did not occur.  So I am essentially
>> certain that there is a configuration issue which causes the problem - and
>> it seems probable that the trigger for the problem occurring is how the
>> instance is installed relative to IIS.  I am sure that the problem can be
>> fixed, provided we figure out what configuration is broken.
>>
>> Barring that, however, you can prove it to yourself that it is not
>> Heisenberg at work by simply creating a new instance, and keeping a record
>> of how you do it.  This would be much better for all than to request that
>> ManifoldCF work properly for broken SharePoint installations.
>>
>> Thanks,
>> Karl
>>
>>
>>
>> On Wed, Sep 18, 2013 at 11:38 AM, Dmitry Goldenberg <
>> dgoldenberg@kmwllc.com> wrote:
>>
>>> Karl,
>>>
>>> Not sure about the operator error.  I believe IIS was installed first,
>>> then:
>>>
>>>    - SharePoint was installed, using the standard installer
>>>    - some subsites were created, some libraries
>>>    - no custom plugins
>>>
>>> We have seen issues with Microsoft not returning listings of subsites in
>>> the past, in another installation. That installation was not done by us,
>>> and I don't believe it was AWS-based.
>>>
>>> Back then, in my custom connector, I had worked around the issue by
>>> getting the list of all subsites using one getAllSubWebCollecton call, then
>>> ascertaining parent/child relationships among the subsites in my code
>>> "manually" rather than relying on the MS services.  The crawl of an SP
>>> instance was done recursively, it'd start at the root, get everything
>>> there, then recursively apply the same logic for each subsite out of my
>>> pre-fetched subsite list.
>>>
>>> - Dmitry
>>>
>>>
>>> On Wed, Sep 18, 2013 at 11:27 AM, Karl Wright <daddywri@gmail.com>wrote:
>>>
>>>> Hi Dmitry,
>>>>
>>>> It may be worth reviewing with that engineer what steps he took when he
>>>> installed the instance.  If he used the standard installer, IIRC there are
>>>> a number of ways you can mess this up - the primary way being if you try to
>>>> install IIS afterwards and then just try to patch things up.  The canned
>>>> install usually does best if IIS is installed first.
>>>>
>>>> At any rate, I think that you have a probable case of "operator error"
>>>> here...
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>> I can think of a few possibilities.
>>>>
>>>>
>>>> On Wed, Sep 18, 2013 at 11:16 AM, Dmitry Goldenberg <
>>>> dgoldenberg@kmwllc.com> wrote:
>>>>
>>>>> SharePoint was not installed by a domain user (the Windows instance is
>>>>> not on a domain).
>>>>>
>>>>> This is not a canned AWS SharePoint installation; an engineer on the
>>>>> team installed it, using the standard installer program, I believe.
>>>>>
>>>>>
>>>>> On Wed, Sep 18, 2013 at 10:34 AM, Will Parkinson <
>>>>> parkinson.will@gmail.com> wrote:
>>>>>
>>>>>> Dmitry, do you know if Sharepoint was installed by a domain user?  I
>>>>>> have heard of issues with Sharepoint if not installed using a domain user
>>>>>> (e.g. DOMAIN\someuser)
>>>>>>
>>>>>>
>>>>>> On Thu, Sep 19, 2013 at 12:31 AM, Will Parkinson <
>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>
>>>>>>> No, i didnt have that issue.  The issue i had was the // and ///
>>>>>>> references being added in the wrong places in the page URL's
>>>>>>>
>>>>>>> I was getting things like
>>>>>>>
>>>>>>>  /Site Name/Lib///rary/test.aspx
>>>>>>>
>>>>>>> My first set up was an out of the box set up, the main site was on
>>>>>>> port 80, using classic authentication.  With the path modification in the
>>>>>>> mcf-sharepoint-connector.jar, it worked very well.
>>>>>>>
>>>>>>> I set up active directory on that same server to authenticate via
>>>>>>> NTLM
>>>>>>>
>>>>>>> The second server had the site on https on port 443, had claims
>>>>>>> based authentication using ADFS and kerberos.  I had to modify the
>>>>>>> mcf-sharepoint-connector.jar and MCPermissions.wsp to get this to work
>>>>>>> around the lack of SID's returned from the permissions webservice.
>>>>>>>
>>>>>>> In this case, Active Directory and ADFS were set up on separate AWS
>>>>>>> servers
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Sep 19, 2013 at 12:23 AM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>>>
>>>>>>>> Hi Will,
>>>>>>>>
>>>>>>>> The path stuff we're already dealing with - see the CONNECTORS-772
>>>>>>>> branch.  But what we are having trouble with is something much more
>>>>>>>> fundamental.  On Dmitry's AWS instance, when you talk to the web services
>>>>>>>> for a root site, it works fine.  But as soon as you add a subsite path into
>>>>>>>> the URL, it *seems* to work fine, but actually behaves as though you never
>>>>>>>> specified any subsite at all - it returns root site information only.  On
>>>>>>>> this system, this occurs for ALL web services, even Microsoft's.  The
>>>>>>>> reason is that the value of SPContext.Current.Web never points to the
>>>>>>>> subsite you specified.  The result is that you cannot use SharePoint
>>>>>>>> subsites with ManifoldCF without causing havoc.
>>>>>>>>
>>>>>>>> Does this sound completely unfamiliar to you?  If you never
>>>>>>>> encountered it, then we should compare how these instances were set up,
>>>>>>>> unless you have any further ideas.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Sep 18, 2013 at 10:12 AM, Will Parkinson <
>>>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hey Karl (and Dmitry)
>>>>>>>>>
>>>>>>>>> For AWS, i had to modify the way the the relPath in the in the
>>>>>>>>> addFile function in the FileStream class (in SharepointRepository.java)
>>>>>>>>> calculated the modifiedPath
>>>>>>>>>
>>>>>>>>> Essentially, i ensured that the relPath always contains the site
>>>>>>>>> as part of the path
>>>>>>>>>
>>>>>>>>>               if (siteName != "") {
>>>>>>>>>                     int siteInd = relPath.indexOf(siteName);
>>>>>>>>>                     if (siteInd == -1 || siteInd > 3) {
>>>>>>>>>                         relPath = siteName + relPath;
>>>>>>>>>                     }
>>>>>>>>>                 }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Which fixed my pathing issue and the index out of bounds errors.
>>>>>>>>>
>>>>>>>>> I have also made many other modification to cope with AD and
>>>>>>>>> claims based auth and compatibility with Sharepoint 2013
>>>>>>>>>
>>>>>>>>> Dmitry, i have uploaded my modified mcf-sharepoint-connector.jar
>>>>>>>>> and MCPermissions WSP if you would like to try them out
>>>>>>>>>
>>>>>>>>> http://pngnetworks.com/sharepoint-2010-claims.zip
>>>>>>>>>
>>>>>>>>> Just make sure you back up your current ones as this is still very
>>>>>>>>> much in development :)
>>>>>>>>>
>>>>>>>>> Also, the logging is very verbose.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> Will
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Sep 18, 2013 at 11:41 PM, Karl Wright <daddywri@gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> Hi Will,
>>>>>>>>>> When you folks set up YOUR AWS instance, did it work with MCF out
>>>>>>>>>> of the box?  Or did you need to do something?  And, if so, what did you do?
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Sep 18, 2013 at 9:28 AM, Will Parkinson <
>>>>>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Yes that's right, only really interested in the site that you
>>>>>>>>>>> are trying to crawl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Sep 18, 2013 at 11:25 PM, Dmitry Goldenberg <
>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Will,
>>>>>>>>>>>>
>>>>>>>>>>>> For SharePoint - 80, the output is
>>>>>>>>>>>>
>>>>>>>>>>>> NTAuthenticationProviders       : (STRING) "NTLM"
>>>>>>>>>>>>
>>>>>>>>>>>> I assume we're not interested in the Default Web Site; for
>>>>>>>>>>>> that, the output is simply "The parameter NTAuthenticationProviders is not
>>>>>>>>>>>> set at this node."
>>>>>>>>>>>>
>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Sep 18, 2013 at 9:16 AM, Will Parkinson <
>>>>>>>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> If you open IIS manager and click on sites, it is displayed in
>>>>>>>>>>>>> the ID column (see screenshot attached)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 10:55 PM, Dmitry Goldenberg <
>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> **Hi Will,
>>>>>>>>>>>>>> Sorry, what is the "sharepoint website *number*" in that
>>>>>>>>>>>>>> invokation?
>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 8:53 AM, Will Parkinson <
>>>>>>>>>>>>>> parkinson.will@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Dmitry
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Just out of interest, what does the following command output
>>>>>>>>>>>>>>> on your system
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> cd to C:\inetpub\adminscripts
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *cscript adsutil.vbs get w3svc/<put your sharepoint website
>>>>>>>>>>>>>>> number here>/root/NTAuthenticationProviders*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Will
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 10:44 PM, Karl Wright <
>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "This is the second time I'm encountering the issue which
>>>>>>>>>>>>>>>> leads me to believe it's a quirk of IIS and/or SharePoint."
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It cannot be just a quirk of SharePoint because
>>>>>>>>>>>>>>>> SharePoint's UI etc could not create or work with subsites if that was
>>>>>>>>>>>>>>>> true.  It may well be a configuration issue with IIS, which is indeed what
>>>>>>>>>>>>>>>> I suspect.  I have pinged all the resources I know of to try and get some
>>>>>>>>>>>>>>>> insight as to why this is happening.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "Perhaps this is something that can be worked into the
>>>>>>>>>>>>>>>> 'fabric' of ManifoldCF as a workaround for a known issue."
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Like I said before, this is a huge amount of work,
>>>>>>>>>>>>>>>> tantamount to rewriting most of the connector.  If this is what you want to
>>>>>>>>>>>>>>>> request, that is your option, but there is no way we'd complete any of this
>>>>>>>>>>>>>>>> work before December/January at the earliest.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "Just to understand this a bit better, the main breakage
>>>>>>>>>>>>>>>> here is that the wildcards don't work properly, right? "
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> No, it means that ManifoldCF cannot get at any data of any
>>>>>>>>>>>>>>>> kind associated with a SharePoint subsite.  Accessing root data works
>>>>>>>>>>>>>>>> fine.  If you try to crawl as things are now, you must disable all subsites
>>>>>>>>>>>>>>>> and just crawl the root site, or you will crawl the same things with longer
>>>>>>>>>>>>>>>> and longer paths indefinitely.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 8:38 AM, Dmitry Goldenberg <
>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This is the second time I'm encountering the issue which
>>>>>>>>>>>>>>>>> leads me to believe it's a quirk of IIS and/or SharePoint. Perhaps this is
>>>>>>>>>>>>>>>>> something that can be worked into the 'fabric' of ManifoldCF as a
>>>>>>>>>>>>>>>>> workaround for a known issue. I understand that it may have far reaching
>>>>>>>>>>>>>>>>> tenticles but I wonder if that's really the only option...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Just to understand this a bit better, the main breakage
>>>>>>>>>>>>>>>>> here is that the wildcards don't work properly, right?  In theory if I have
>>>>>>>>>>>>>>>>> a repo connector config which lists specific library and list paths, things
>>>>>>>>>>>>>>>>> should work?  It's only when the /* types of wildcards are included, we're
>>>>>>>>>>>>>>>>> in trouble?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Sep 18, 2013 at 8:07 AM, Karl Wright <
>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Someone else was having a similar problem. See
>>>>>>>>>>>>>>>>>> http://social.technet.microsoft.com/Forums/sharepoint/en-US/e4b53c63-b89a-4356-a7b0-6ca7bfd22826/getting-sharepoint-subsite-from-custom-webservice.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Apparently it does depend on how you get to the web
>>>>>>>>>>>>>>>>>> service, which does argue that it is an IIS issue.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, Sep 17, 2013 at 5:44 PM, Karl Wright <
>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> As discussed privately I had a look at your system.
>>>>>>>>>>>>>>>>>>> What is happening is that the C# static SPContext.Current.Web is not
>>>>>>>>>>>>>>>>>>> reflecting the subsite in any url that contains a subsite.  In other words,
>>>>>>>>>>>>>>>>>>> the URL coming in might be "
>>>>>>>>>>>>>>>>>>> http://servername/subsite1/_vti_bin/MCPermissions.asmx",
>>>>>>>>>>>>>>>>>>> but the MCPermissions.asmx plugin will think it is being executed in the
>>>>>>>>>>>>>>>>>>> root context ("http://servername").  That's pretty
>>>>>>>>>>>>>>>>>>> broken behavior, so I'm guessing that the problem is that either IIS or
>>>>>>>>>>>>>>>>>>> SharePoint is somehow misconfigured to do this, and the web services would
>>>>>>>>>>>>>>>>>>> then begin to work right again.  But I have no idea how this should
>>>>>>>>>>>>>>>>>>> actually be fixed.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Will Parkinson, one of the subscribers of this list, may
>>>>>>>>>>>>>>>>>>> find the symptoms meaningful, since he set up an AWS SharePoint instance
>>>>>>>>>>>>>>>>>>> before.  I hope he will respond in a helpful way.  Until then, I think we
>>>>>>>>>>>>>>>>>>> are stuck.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Tue, Sep 17, 2013 at 9:49 AM, Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> It looks like I'll be able to get access for you to the
>>>>>>>>>>>>>>>>>>>> test system we're using. Would you be interested in working with the system
>>>>>>>>>>>>>>>>>>>> directly? I certainly don't mind doing some testing but I thought we'd
>>>>>>>>>>>>>>>>>>>> speed things up this way. If so, could you email me from a more private
>>>>>>>>>>>>>>>>>>>> account so we can set this up?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Tue, Sep 17, 2013 at 7:38 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Another interesting bit from the log:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') -
>>>>>>>>>>>>>>>>>>>>> SharePoint: Library list: '/_catalogs/lt/Forms/AllItems.aspx', 'List
>>>>>>>>>>>>>>>>>>>>> Template Gallery'
>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') -
>>>>>>>>>>>>>>>>>>>>> SharePoint: Library list: '/_catalogs/masterpage/Forms/AllItems.aspx',
>>>>>>>>>>>>>>>>>>>>> 'Master Page Gallery'
>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') -
>>>>>>>>>>>>>>>>>>>>> SharePoint: Library list: '/Shared Documents/Forms/AllItems.aspx', 'Shared
>>>>>>>>>>>>>>>>>>>>> Documents'
>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') -
>>>>>>>>>>>>>>>>>>>>> SharePoint: Library list: '/SiteAssets/Forms/AllItems.aspx', 'Site Assets'
>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') -
>>>>>>>>>>>>>>>>>>>>> SharePoint: Library list: '/SitePages/Forms/AllPages.aspx', 'Site Pages'
>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') -
>>>>>>>>>>>>>>>>>>>>> SharePoint: Library list: '/_catalogs/solutions/Forms/AllItems.aspx',
>>>>>>>>>>>>>>>>>>>>> 'Solution Gallery'
>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') -
>>>>>>>>>>>>>>>>>>>>> SharePoint: Library list: '/Style Library/Forms/AllItems.aspx', 'Style
>>>>>>>>>>>>>>>>>>>>> Library'
>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') -
>>>>>>>>>>>>>>>>>>>>> SharePoint: Library list: '/Test Library 1/Forms/AllItems.aspx', 'Test
>>>>>>>>>>>>>>>>>>>>> Library 1'
>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') -
>>>>>>>>>>>>>>>>>>>>> SharePoint: Library list: '/_catalogs/theme/Forms/AllItems.aspx', 'Theme
>>>>>>>>>>>>>>>>>>>>> Gallery'
>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') -
>>>>>>>>>>>>>>>>>>>>> SharePoint: Library list: '/_catalogs/wp/Forms/AllItems.aspx', 'Web Part
>>>>>>>>>>>>>>>>>>>>> Gallery'
>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') -
>>>>>>>>>>>>>>>>>>>>> SharePoint: Checking whether to include library
>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Shared Documents'
>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') -
>>>>>>>>>>>>>>>>>>>>> SharePoint: Library '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Shared
>>>>>>>>>>>>>>>>>>>>> Documents' exactly matched rule path '/*'
>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') -
>>>>>>>>>>>>>>>>>>>>> SharePoint: Including library
>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Shared Documents'
>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') -
>>>>>>>>>>>>>>>>>>>>> SharePoint: Checking whether to include library
>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SiteAssets'
>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') -
>>>>>>>>>>>>>>>>>>>>> SharePoint: Library '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SiteAssets'
>>>>>>>>>>>>>>>>>>>>> exactly matched rule path '/*'
>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') -
>>>>>>>>>>>>>>>>>>>>> SharePoint: Including library
>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SiteAssets'
>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') -
>>>>>>>>>>>>>>>>>>>>> SharePoint: Checking whether to include library
>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SitePages'
>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') -
>>>>>>>>>>>>>>>>>>>>> SharePoint: Library '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SitePages'
>>>>>>>>>>>>>>>>>>>>> exactly matched rule path '/*'
>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') -
>>>>>>>>>>>>>>>>>>>>> SharePoint: Including library
>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/SitePages'
>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') -
>>>>>>>>>>>>>>>>>>>>> SharePoint: Checking whether to include library
>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Style Library'
>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') -
>>>>>>>>>>>>>>>>>>>>> SharePoint: Library '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Style Library'
>>>>>>>>>>>>>>>>>>>>> exactly matched rule path '/*'
>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,799 (Worker thread '7') -
>>>>>>>>>>>>>>>>>>>>> SharePoint: Including library
>>>>>>>>>>>>>>>>>>>>> '/Abcd/Klmnopqr/Klmnopqr/Defghij/Defghij/Style Library'
>>>>>>>>>>>>>>>>>>>>> <<<<<<
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> This time it appears that it is the Lists service that
>>>>>>>>>>>>>>>>>>>>> is broken and does not recognize the parent site.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I haven't corrected this problem yet since now I am
>>>>>>>>>>>>>>>>>>>>> beginning to wonder if *any* of the web services under Amazon work at all
>>>>>>>>>>>>>>>>>>>>> for subsites.  We may be better off implementing everything we need in the
>>>>>>>>>>>>>>>>>>>>> MCPermissions service.  I will ponder this as I continue to research the
>>>>>>>>>>>>>>>>>>>>> logs.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> It's still valuable to check my getSites()
>>>>>>>>>>>>>>>>>>>>> implementation.  I'll be doing another round of work tonight on the plugin.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 8:45 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> The augmented plugin can be downloaded from
>>>>>>>>>>>>>>>>>>>>>> http://people.apache.org/~kwright/MetaCarta.SharePoint.MCPermissionsService.wsp.  The revised connector code is also ready, and should be checked out and
>>>>>>>>>>>>>>>>>>>>>> built from
>>>>>>>>>>>>>>>>>>>>>> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-772.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Once you set it all up, you can see if it is doing
>>>>>>>>>>>>>>>>>>>>>> the right thing by just trying to drill down through subsites in the UI.
>>>>>>>>>>>>>>>>>>>>>> You should always see a list of subsites that is appropriate for the
>>>>>>>>>>>>>>>>>>>>>> context you are in; if this does not happen it is not working.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 7:45 PM, Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I can see how preloading the list of subsites may be
>>>>>>>>>>>>>>>>>>>>>>> less optimal.. The advantage of doing it this way is one call and you've
>>>>>>>>>>>>>>>>>>>>>>> got the structure in memory, which may be OK unless there are sites with a
>>>>>>>>>>>>>>>>>>>>>>> ton of subsites which may stress out memory. The disadvantage is having to
>>>>>>>>>>>>>>>>>>>>>>> throw this structure around..
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Yes, I'll certainly help test out your changes, just
>>>>>>>>>>>>>>>>>>>>>>> let me know when they're available.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 7:19 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the code snippet.  I'd prefer, though,
>>>>>>>>>>>>>>>>>>>>>>>> to not preload the entire site structure in memory.  Probably it would be
>>>>>>>>>>>>>>>>>>>>>>>> better to just add another method to the ManifoldCF SharePoint 2010
>>>>>>>>>>>>>>>>>>>>>>>> plugin.  More methods are going to be added anyway to support Claim Space
>>>>>>>>>>>>>>>>>>>>>>>> Authentication, so I guess this would be just one more.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> We honestly have never seen this problem before -
>>>>>>>>>>>>>>>>>>>>>>>> so it's not just flakiness, it has something to do with the installation,
>>>>>>>>>>>>>>>>>>>>>>>> I'm certain.  At any rate, I'll get going right away on a workaround - if
>>>>>>>>>>>>>>>>>>>>>>>> you are willing to test what I produce.  I'm also certain there is at least
>>>>>>>>>>>>>>>>>>>>>>>> one other issue, but hopefully that will become clearer once this one is
>>>>>>>>>>>>>>>>>>>>>>>> resolved.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 6:49 PM, Dmitry Goldenberg
>>>>>>>>>>>>>>>>>>>>>>>> <dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> >> subsite discovery is effectively disabled
>>>>>>>>>>>>>>>>>>>>>>>>> except directly under the root site
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Yes. Come to think of it, I once came across this
>>>>>>>>>>>>>>>>>>>>>>>>> problem while implementing a SharePoint connector.  I'm not sure whether
>>>>>>>>>>>>>>>>>>>>>>>>> it's exactly what's happening with the issue we're discussing but looks
>>>>>>>>>>>>>>>>>>>>>>>>> like it.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I started off by using multiple getWebCollection
>>>>>>>>>>>>>>>>>>>>>>>>> calls to get child subsites of sites and trying to navigate down that way.
>>>>>>>>>>>>>>>>>>>>>>>>> The problem was that getWebCollection was always returning the immediate
>>>>>>>>>>>>>>>>>>>>>>>>> subsites of the root site no matter whether you're at the root or below, so
>>>>>>>>>>>>>>>>>>>>>>>>> I ended up generating infinite loops.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I switched over to using a single
>>>>>>>>>>>>>>>>>>>>>>>>> getAllSubWebCollection call and caching its results. That call returns the
>>>>>>>>>>>>>>>>>>>>>>>>> full list of all subsites as pairs of Title and Url.  I had a POJO similar
>>>>>>>>>>>>>>>>>>>>>>>>> to the one below which held the list of sites and contained logic for
>>>>>>>>>>>>>>>>>>>>>>>>> enumerating the child sites, given the URL of a (parent) site.  From what I
>>>>>>>>>>>>>>>>>>>>>>>>> recall, getWebCollection works inconsistently, either across SP versions or
>>>>>>>>>>>>>>>>>>>>>>>>> across installations, but the logic below should work in any case.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> *** public class SubSiteCollection -- holds a list
>>>>>>>>>>>>>>>>>>>>>>>>> of CrawledSite pojo's each of which is a { title, url }.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> *** SubSiteCollection has the following:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>  public List<CrawledSite>
>>>>>>>>>>>>>>>>>>>>>>>>> getImmediateSubSites(String siteUrl) {
>>>>>>>>>>>>>>>>>>>>>>>>>   List<CrawledSite> subSites = new
>>>>>>>>>>>>>>>>>>>>>>>>> ArrayList<CrawledSite>();
>>>>>>>>>>>>>>>>>>>>>>>>>   for (CrawledSite site : sites) {
>>>>>>>>>>>>>>>>>>>>>>>>>    if (isChildOf(siteUrl,
>>>>>>>>>>>>>>>>>>>>>>>>> site.getUrl().toString())) {
>>>>>>>>>>>>>>>>>>>>>>>>>     subSites.add(site);
>>>>>>>>>>>>>>>>>>>>>>>>>    }
>>>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>>>   return subSites;
>>>>>>>>>>>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>  private static boolean isChildOf(String
>>>>>>>>>>>>>>>>>>>>>>>>> parentUrl, String urlToCheck) {
>>>>>>>>>>>>>>>>>>>>>>>>>   final String parent = normalizeUrl(parentUrl);
>>>>>>>>>>>>>>>>>>>>>>>>>   final String child = normalizeUrl(urlToCheck);
>>>>>>>>>>>>>>>>>>>>>>>>>   boolean ret = false;
>>>>>>>>>>>>>>>>>>>>>>>>>   if (child.startsWith(parent)) {
>>>>>>>>>>>>>>>>>>>>>>>>>    String remainder =
>>>>>>>>>>>>>>>>>>>>>>>>> child.substring(parent.length());
>>>>>>>>>>>>>>>>>>>>>>>>>    ret = StringUtils.countOccurrencesOf(remainder,
>>>>>>>>>>>>>>>>>>>>>>>>> SLASH) == 1;
>>>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>>>   return ret;
>>>>>>>>>>>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>  private static String normalizeUrl(String url) {
>>>>>>>>>>>>>>>>>>>>>>>>>   return ((url.endsWith(SLASH)) ? url : url +
>>>>>>>>>>>>>>>>>>>>>>>>> SLASH).toLowerCase();
>>>>>>>>>>>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 2:54 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Have a look at this sequence also:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,817 (Worker thread '8')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Subsite list: '
>>>>>>>>>>>>>>>>>>>>>>>>>> http://ec2-99-99-99-99.compute-1.amazonaws.com/Abcd',
>>>>>>>>>>>>>>>>>>>>>>>>>> 'Abcd'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,817 (Worker thread '8')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Subsite list: '
>>>>>>>>>>>>>>>>>>>>>>>>>> http://ec2-99-99-99-99.compute-1.amazonaws.com/Defghij',
>>>>>>>>>>>>>>>>>>>>>>>>>> 'Defghij'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,817 (Worker thread '8')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Subsite list: '
>>>>>>>>>>>>>>>>>>>>>>>>>> http://ec2-99-99-99-99.compute-1.amazonaws.com/Klmnopqr',
>>>>>>>>>>>>>>>>>>>>>>>>>> 'Klmnopqr'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread '8')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Checking whether to include site
>>>>>>>>>>>>>>>>>>>>>>>>>> '/Klmnopqr/Abcd/Abcd/Klmnopqr/Abcd'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread '8')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Abcd' exactly matched rule
>>>>>>>>>>>>>>>>>>>>>>>>>> path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread '8')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Including site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Abcd'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread '8')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Checking whether to include site
>>>>>>>>>>>>>>>>>>>>>>>>>> '/Klmnopqr/Abcd/Abcd/Klmnopqr/Defghij'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread '8')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Defghij' exactly matched
>>>>>>>>>>>>>>>>>>>>>>>>>> rule path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread '8')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Including site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Defghij'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread '8')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Checking whether to include site
>>>>>>>>>>>>>>>>>>>>>>>>>> '/Klmnopqr/Abcd/Abcd/Klmnopqr/Klmnopqr'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread '8')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Klmnopqr' exactly matched
>>>>>>>>>>>>>>>>>>>>>>>>>> rule path '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 11:43:56,818 (Worker thread '8')
>>>>>>>>>>>>>>>>>>>>>>>>>> - SharePoint: Including site '/Klmnopqr/Abcd/Abcd/Klmnopqr/Klmnopqr'
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> This is using the GetSites(String parent) method
>>>>>>>>>>>>>>>>>>>>>>>>>> with a site name of "/Klmnopqr/Abcd/Abcd/Klmnopqr", and getting back three
>>>>>>>>>>>>>>>>>>>>>>>>>> sites (!!).  The parent path is not correct, obviously, but nevertheless
>>>>>>>>>>>>>>>>>>>>>>>>>> this one way in which paths are getting completely messed up.  It *looks*
>>>>>>>>>>>>>>>>>>>>>>>>>> like the Webs web service is broken in such a way as to ignore the URL
>>>>>>>>>>>>>>>>>>>>>>>>>> coming in, except for the base part, which means that subsite discovery is
>>>>>>>>>>>>>>>>>>>>>>>>>> effectively disabled except directly under the root site.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> This might still be OK if it is not possible to
>>>>>>>>>>>>>>>>>>>>>>>>>> create subsites of subsites in this version of SharePoint.  Can you confirm
>>>>>>>>>>>>>>>>>>>>>>>>>> that this is or is not possible?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 2:42 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> "This is everything that got generated, from the
>>>>>>>>>>>>>>>>>>>>>>>>>>> very beginning"
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Well, something isn't right.  What I expect to
>>>>>>>>>>>>>>>>>>>>>>>>>>> see that I don't right up front are:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> - A webs "getWebCollection" invocation for
>>>>>>>>>>>>>>>>>>>>>>>>>>> /_vti_bin/webs.asmx
>>>>>>>>>>>>>>>>>>>>>>>>>>> - Two lists "getListCollection" invocations for
>>>>>>>>>>>>>>>>>>>>>>>>>>> /_vti_bin/lists.asmx
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Instead the first transactions I see are from
>>>>>>>>>>>>>>>>>>>>>>>>>>> already busted URLs - which make no sense since there would be no way they
>>>>>>>>>>>>>>>>>>>>>>>>>>> should have been able to get queued yet.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> So there are a number of possibilities.  First,
>>>>>>>>>>>>>>>>>>>>>>>>>>> maybe the log isn't getting cleared out, and the session in question
>>>>>>>>>>>>>>>>>>>>>>>>>>> therefore starts somewhere in the middle of manifoldcf.log.1.  But no:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> C:\logs>grep "POST /_vti_bin/webs"
>>>>>>>>>>>>>>>>>>>>>>>>>>> manifoldcf.log.1
>>>>>>>>>>>>>>>>>>>>>>>>>>> grep: input lines truncated - result questionable
>>>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Nevertheless there are some interesting points
>>>>>>>>>>>>>>>>>>>>>>>>>>> here.  First, note the following response, which I've been able to
>>>>>>>>>>>>>>>>>>>>>>>>>>> determine is against "Test Library 1":
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 13:02:31,590 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>> '23') - SharePoint: getListItems xml response: '<GetListItems xmlns="
>>>>>>>>>>>>>>>>>>>>>>>>>>> http://schemas.microsoft.com/sharepoint/soap/directory/"><GetListItemsResponse
>>>>>>>>>>>>>>>>>>>>>>>>>>> xmlns=""><GetListItemsResult
>>>>>>>>>>>>>>>>>>>>>>>>>>> FileRef="SitePages/Home.aspx"/></GetListItemsResponse></GetListItems>'
>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 13:02:31,590 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>> '23') - SharePoint: Checking whether to include document
>>>>>>>>>>>>>>>>>>>>>>>>>>> '/SitePages/Home.aspx'
>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 13:02:31,590 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>> '23') - SharePoint: File '/SitePages/Home.aspx' exactly matched rule path
>>>>>>>>>>>>>>>>>>>>>>>>>>> '/*'
>>>>>>>>>>>>>>>>>>>>>>>>>>> DEBUG 2013-09-16 13:02:31,590 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>> '23') - SharePoint: Including file '/SitePages/Home.aspx'
>>>>>>>>>>>>>>>>>>>>>>>>>>>  WARN 2013-09-16 13:02:31,590 (Worker thread
>>>>>>>>>>>>>>>>>>>>>>>>>>> '23') - Sharepoint: Unexpected relPath structure; path is
>>>>>>>>>>>>>>>>>>>>>>>>>>> '/SitePages/Home.aspx', but expected <list/library> length of 26
>>>>>>>>>>>>>>>>>>>>>>>>>>> <<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> The FileRef in this case is pointing at what,
>>>>>>>>>>>>>>>>>>>>>>>>>>> exactly?  Is there a SitePages/Home.aspx in the "Test Library 1" library?
>>>>>>>>>>>>>>>>>>>>>>>>>>> Or does it mean to refer back to the root site with this URL construction?
>>>>>>>>>>>>>>>>>>>>>>>>>>> And since this is supposedly at the root level, how come the combined site
>>>>>>>>>>>>>>>>>>>>>>>>>>> + library name comes out to 26??  I get 15, which leaves 11 characters
>>>>>>>>>>>>>>>>>>>>>>>>>>> unaccounted for.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm still looking at the logs to see if I can
>>>>>>>>>>>>>>>>>>>>>>>>>>> glean key information.  Later, if I could set up a crawl against the
>>>>>>>>>>>>>>>>>>>>>>>>>>> sharepoint instance in question, that would certainly help.  I can readily
>>>>>>>>>>>>>>>>>>>>>>>>>>> set up an ssh tunnel if that is what is required.  But I won't be able to
>>>>>>>>>>>>>>>>>>>>>>>>>>> do it until I get home tonight.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 1:58 PM, Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>> Goldenberg <dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is everything that got generated, from the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> very beginning, meaning that I did a fresh build, new database, new
>>>>>>>>>>>>>>>>>>>>>>>>>>>> connection definitions, start. The log must have rolled but the .1 log is
>>>>>>>>>>>>>>>>>>>>>>>>>>>> included.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> If I were to get you access to the actual test
>>>>>>>>>>>>>>>>>>>>>>>>>>>> system, would you mind taking a look? It may be more efficient than sending
>>>>>>>>>>>>>>>>>>>>>>>>>>>> logs..
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 1:48 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> These logs are different but have exactly the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> same problem; they start in the middle when the crawl is already well
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> underway.  I'm wondering if by chance you have more than one agents process
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> running or something?  Or maybe the log is rolling and stuff is getting
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lost?  What's there is not what I would expect to see, at all.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I *did* manage to find two transactions that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> look like they might be helpful, but because the *results* of those
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> transactions are required by transactions that take place minutes *before*
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the log, I have no confidence that I'm looking at anything meaningful.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But I'll get back to you on what I find nonetheless.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you decide repeat this exercise, try
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> watching the log with "tail -f" before starting the job.  You should not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> see any log contents at all until the job is started.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 1:11 PM, Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Goldenberg <dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Attached please find logs which start at the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> beginning. I started from a fresh build (clean db etc.), the logs start at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> server start, then I create the output connection and the repo connection,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> then the job, and then I fire off the job. I aborted the execution about a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> minute into it or so.  That's all that's in the logs with:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> org.apache.manifoldcf.connectors=DEBUG
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> log4j.logger.httpclient.wire.header=DEBUG
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> log4j.logger.org.apache.commons.httpclient=DEBUG
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 12:39 PM, Karl Wright
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Are you sure these are the right logs?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They start right in the middle of a crawl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They are already in a broken state when
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> they start, e.g. the kinds of things that are being looked up are already
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> nonsense paths
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I need to see logs from the BEGINNING of a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fresh crawl to see how the nonsense paths happen.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 11:52 AM, Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Goldenberg <dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've generated logs with details as we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussed.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The job was created afresh, as before:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Path rules:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* file include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* library include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* list include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* site include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Metadata:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* include true
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The logs are attached.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 11:20 AM, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "Do you think that this issue is generic
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with regard to any Amz instance?"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I presume so, since you didn't apparently
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do anything special to set one of these up.  Unfortunately, such instances
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are not part of the free tier, so I am still constrained from setting one
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> up for myself because of household rules here.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "For now, I assume our only workaround is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to list the paths of interest manually"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Depending on what is going wrong, that may
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not even work.  It looks like several SharePoint web service calls may be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> affected, and not in a cleanly predictable way, for this to happen.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "is identification and extraction of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> attachments supported in the SP connector?"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ManifoldCF in general leaves
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> identification and extraction to the search engine.  Solr, for instance
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> uses Tika for this, if so configured.  You can configure your Solr output
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> connection to include or exclude specific mime types or extensions if you
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> want to limit what is attempted.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 11:09 AM, Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Goldenberg <dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, Karl. Do you think that this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issue is generic with regard to any Amz instance? I'm just wondering how
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> easily reproducible this may be..
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For now, I assume our only workaround is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to list the paths of interest manually, i.e. add explicit rules for each
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> library and list.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> A related subject - is identification and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> extraction of attachments supported in the SP connector?  E.g. if I have a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Word doc attached to a Task list item, would that be extracted?  So far, I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> see that library content gets crawled and I'm getting the list item data
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> but am not sure what happens to the attachments.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 10:48 AM, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the additional information.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It does appear like the method that lists subsites is not working as
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> expected under AWS.  Nor are some number of other methods which supposedly
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> just list the children of a subsite.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've reopened CONNECTORS-772 to work on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> addressing this issue.  Please stay tuned.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 10:08 AM, Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Goldenberg <dgoldenberg@kmwllc.com>wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Most of the paths that get generated
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are listed in the attached log, they match what shows up in the diag
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> report. So I'm not sure where they diverge, most of them just don't seem
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> right.  There are 3 subsites rooted in the main site: Abcd, Defghij,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Klmnopqr.  It's strange that the connector would try such paths as:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /*Klmnopqr*/*Defghij*/*Defghij*/Announcements///
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- there are multiple repetitions of the same subsite on the path and to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> begin with, Defghij is not a subsite of Klmnopqr, so why would it try
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this? the /// at the end doesn't seem correct either, unless I'm missing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> something in how this pathing works.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /Test Library
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1/Financia/lProjectionsTemplate.xl/Abcd/Announcements -- looks wrong. A
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> docname is mixed into the path, a subsite ends up after a docname?...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /Shared
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Documents/Personal_Fina/ncial_Statement_1_1.xl/Defghij/ -- same types of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> issues plus now somehow the docname got split with a forward slash?..
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> There are also a bunch of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> StringIndexOutOfBoundsException's.  Perhaps this logic doesn't fit with the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pathing we're seeing on this amz-based installation?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd expect the logic to just know that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> root contains 3 subsites, and work off that. Each subsite has a specific
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> list of libraries and lists, etc. It seems odd that the connector gets into
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this matching pattern, and tries what looks like thousands of variations (I
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aborted the execution).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 16, 2013 at 7:56 AM, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To clarify, the way you would need to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> analyze this is to run a crawl with the wildcards as you have selected,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abort if necessary after a while, and then use the Document Status report
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to list the document identifiers that had been generated.  Find a document
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> identifier that you believe represents a path that is illegal, and figure
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> out what SOAP getChild call caused the problem by returning incorrect
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data.  In other words, find the point in the path where the path diverges
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from what exists into what doesn't exist, and go back in the ManifoldCF
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logs to find the particular SOAP request that led to the issue.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd expect from your description that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the problem lies with getting child sites given a site path, but that's
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> just a guess at this point.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Sep 15, 2013 at 6:40 PM, Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Wright <daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't understand what you mean by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "I've tried the set of wildcards as below and I seem to be running into a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> lot of cycles, where various subsite folders are appended to each other and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an extraction of data at all of those locations is attempted".   If you are
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> seeing cycles it means that document discovery is still failing in some
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> way.  For each folder/library/site/subsite, only the children of that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> folder/library/site/subsite should be appended to the path - ever.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you can give a specific example,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> preferably including the soap back-and-forth, that would be very helpful.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Sep 15, 2013 at 1:40 PM,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dmitry Goldenberg <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dgoldenberg@kmwllc.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Quick question. Is there an easy way
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to configure an SP repo connection for crawling of all content, from the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> root site all the way down?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've tried the set of wildcards as
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> below and I seem to be running into a lot of cycles, where various subsite
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> folders are appended to each other and an extraction of data at all of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> those locations is attempted. Ideally I'd like to avoid having to construct
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an exact set of paths because the set may change, especially with new
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> content being added.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Path rules:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* file include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* library include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* list include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* site include
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Metadata:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> /* include true
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'd also like to pull down any files
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> attached to list items. I'm hoping that some type of "/* file include"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should do it, once I figure out how to safely include all content.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Dmitry
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message