manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Will Parkinson <parkinson.w...@gmail.com>
Subject Re: Sharepoint SID extraction for groups
Date Fri, 22 Nov 2013 06:18:58 GMT
Thanks for that Karl, it sounds like a good way forward.

I have built trunk and installed the 1.5-dev version and started the setup,
but have found an issue with the Sharepoint Native connection

I am getting this error

Connection status: Accessing site failed with unexpected SharePoint error
code 0x80131600: User cannot be found.
What sort of Sharepoint credentials do i need for the Native connection?

Cheers,

Will


On Fri, Nov 22, 2013 at 12:39 AM, Karl Wright <daddywri@gmail.com> wrote:

> Hi Will,
>
> I've committed to trunk the following:
>
> - Two new authorities: SharePoint/Native and SharePoint/AD
> - A revised SharePoint connector, which has the ability to EITHER use the
> legacy AD authority, or the native SharePoint family of authorities.
>
> If you are willing to experiment with trunk code, I think you will find
> that using SharePoint native authorities will solve your long-list-of-SIDs
> issue.  If you use both the SharePoint/Native authority and the
> SharePoint/AD authority under the same authority group that the repository
> connection uses, in theory it should support full Claim Space auth.  I say
> "in theory" because I have not had the opportunity to test it yet in an
> actual environment.  I'd love to have that chance.
>
> Please let me know if you are willing to work collaboratively on finishing
> this off.  I think it would be far better to take this route than continue
> to hack away at 1.4 code or earlier.
>
> What do you think?
>
> Karl
>
>
>
>
> On Thu, Nov 21, 2013 at 9:23 AM, Will Parkinson <parkinson.will@gmail.com>wrote:
>
>> Hi Karl,
>>
>> This looks like its a postgres 9.1 issue, i downgraded to postgres 8.4
>> and it's no longer an issue.
>>
>> Just back to the claims based authentication, i ended up writing a class
>> that extracts the missing SID's for sharepoint groups from AD which works
>> perfectly fine in itself, but the SID lists are huge (sometime in excess of
>> 1.5MB) which is causing an issue with database slowness.  Inserting and
>> trawling through tables with large amounts of SID's stored in them seems to
>> be a problem.
>>
>> I am just looking for a solution and testing a few things, including
>> storing the SID's in a file on the server and "attaching" them before they
>> go out of the custom output connector we have built.
>>
>> Is there an easy way in the SPSProxyHelper.java class that i can get the
>> full sharepoint URL of the page being processed?
>>
>> Cheers,
>>
>> Will
>>
>>
>> On Mon, Nov 18, 2013 at 2:54 AM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Hi Will,
>>>
>>> I looked at the pivot exception, but it seems like it *is* detecting
>>> that it should retry the transaction, and is indeed retrying.  This is
>>> expected behavior.  You did not include the beginning of the message; if it
>>> was DEBUG or WARNING I would be comfortable that it was doing the right
>>> thing.
>>>
>>> Somewhere else, though, there may well be an actual database ERROR that
>>> is causing the system to get hung.  This will show up as an ERROR in the
>>> log, and when you do a thread dump, all the worker threads will be waiting
>>> on something in WorkerResetManager.  Could you include more of the log so
>>> that I can have a look at this?
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>>
>>> On Sat, Nov 16, 2013 at 2:06 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Hi will,
>>>> The long running query is not fatal - it is just a warning.
>>>>
>>>> The very-long sid list requires a SharePoint authority, as discussed.
>>>>
>>>> The pivot error sounds like it is something that can be addressed
>>>> though.  Please create a ticket and put the full exception into it,
>>>> and I will look at it either tomorrow or Monday.
>>>>
>>>> Thanks,
>>>> Karl
>>>>
>>>> Sent from my Windows Phone
>>>>
>>>> -----Original Message-----
>>>> From: Will Parkinson
>>>> Sent: 11/16/2013 10:10 AM
>>>> To: user@manifoldcf.apache.org
>>>> Subject: Re: Sharepoint SID extraction for groups
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Hi Karl,
>>>>
>>>>
>>>> Yeah that seems to be be case, to get ManifoldCF to work in my case i
>>>> just created a separate class to obtain all the user SID's directly
>>>> from AD if the group assigned in Sharepoint is an AD group.  This
>>>> seems to work fine for now, but it seems to be causing a few database
>>>> issues.
>>>>
>>>> First of all, some of the SID lists are up to 1.5MB, which seems to be
>>>> causing the carrydown table to become huge.  I am also getting errors
>>>> like
>>>>
>>>> 1C159E0: ERROR: could not serialize access due to read/write
>>>> dependencies among transactions
>>>>    Detail: Reason code: Canceled on identification as a pivot, during
>>>> conflict in checking.
>>>>   Hint: The transaction might succeed if retried.; sleeping for 56816 ms
>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: ERROR:
>>>> could not serialize access due to read/write dependencies among
>>>> transactions
>>>>    Detail: Reason code: Canceled on identification as a pivot, during
>>>> conflict in checking.
>>>>   Hint: The transaction might succeed if retried.
>>>>         at
>>>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.reinterpretException(DBInterfacePostgreSQL.java:622)
>>>>          at
>>>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performModification(DBInterfacePostgreSQL.java:651)
>>>>         at
>>>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performInsert(DBInterfacePostgreSQL.java:187)
>>>>          at
>>>> org.apache.manifoldcf.core.database.BaseTable.performInsert(BaseTable.java:68)
>>>>         at
>>>> org.apache.manifoldcf.crawler.jobs.Carrydown.recordCarrydownDataMultiple(Carrydown.java:343)
>>>>         at
>>>> org.apache.manifoldcf.crawler.jobs.JobManager.addDocuments(JobManager.java:4174)
>>>>          at
>>>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.processDocumentReferences(WorkerThread.java:2017)
>>>>         at
>>>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.flush(WorkerThread.java:1948)
>>>>          at
>>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:562)
>>>> Caused by: org.postgresql.util.PSQLException: ERROR: could not
>>>> serialize access due to read/write dependencies among transactions
>>>>    Detail: Reason code: Canceled on identification as a pivot, during
>>>> conflict in checking.
>>>>   Hint: The transaction might succeed if retried.
>>>>         at
>>>> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
>>>>          at
>>>> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
>>>>         at
>>>> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
>>>>         at
>>>> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
>>>>          at
>>>> org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
>>>>         at
>>>> org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:334)
>>>>         at
>>>> org.apache.manifoldcf.core.database.Database.execute(Database.java:883)
>>>>          at
>>>> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:683)
>>>>
>>>> And then i eventually get an error like this
>>>>
>>>>  WARN 2013-11-17 00:41:09,058 (Finisher thread) - Found a long-running
>>>> query (77260 ms): [SELECT id FROM jobs WHERE status IN (?,?,?,?,?) FOR
>>>> UPDATE]
>>>>   WARN 2013-11-17 00:41:09,059 (Finisher thread) -   Parameter 0: 'A'
>>>>  WARN 2013-11-17 00:41:09,059 (Finisher thread) -   Parameter 1: 'W'
>>>>  WARN 2013-11-17 00:41:09,059 (Finisher thread) -   Parameter 2: 'R'
>>>>   WARN 2013-11-17 00:41:09,059 (Finisher thread) -   Parameter 3: 'O'
>>>>  WARN 2013-11-17 00:41:09,059 (Finisher thread) -   Parameter 4: 'U'
>>>>  WARN 2013-11-17 00:41:09,060 (Finisher thread) -  Plan: LockRows
>>>> (cost=0.00..3.34 rows=5 width=14) (actual time=0.026..0.027 rows=1
>>>> loops=1)
>>>>   WARN 2013-11-17 00:41:09,060 (Finisher thread) -  Plan:   ->  Seq
>>>> Scan on jobs  (cost=0.00..3.29 rows=5 width=14) (actual
>>>> time=0.024..0.024 rows=1 loops=1)
>>>>  WARN 2013-11-17 00:41:09,060 (Finisher thread) -  Plan:
>>>> Filter: (status = ANY ('{A,W,R,O,U}'::bpchar[]))
>>>>   WARN 2013-11-17 00:41:09,060 (Finisher thread) -  Plan:         Rows
>>>> Removed by Filter: 17
>>>>  WARN 2013-11-17 00:41:09,060 (Finisher thread) -  Plan: Total runtime:
>>>> 0.058 ms
>>>>  WARN 2013-11-17 00:41:09,060 (Finisher thread) -
>>>>
>>>> And then the update stops completely, even though the status on the
>>>> "Status and job management page" is still set as "running".  Do you
>>>> have any ideas on how i can fix this?
>>>>
>>>> I am doing some research at the moment on the best way to store
>>>> permissions information without storing 100's of SID's.
>>>>
>>>> Cheers,
>>>>
>>>> Will
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Nov 6, 2013 at 11:42 PM, Karl Wright <daddywri@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>> I should also add that, as far as ActiveDirectory groups go, my
>>>> understanding is that in non-Claim-Space versions of SharePoint,
>>>> there's a SharePoint group created for each AD group.  So a SharePoint
>>>> user will belong to some native SharePoint groups, but also to some
>>>> "mirrored" SharePoint groups that are created because of the user's
>>>> group relationships in AD.
>>>>
>>>> Claim Space seems to change this in the following way: SharePoint
>>>> groups no longer mirror AD groups.  Instead, the Claim Space
>>>> authorization tokens implicitly describe the relationships.  So you
>>>> have to talk to both SharePoint AND AD in order to fully understand
>>>> what documents in SharePoint are authorized for what users.
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Nov 6, 2013 at 8:37 AM, Karl Wright <daddywri@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Hi Will,
>>>>
>>>>
>>>>
>>>> The current connector indeed maps SharePoint groups to individual user
>>>> SIDs.  That is not terribly scalable, and it is one reason why I've
>>>> created dedicated SharePoint authorities in the CONNECTORS-754-2
>>>> branch, so that we can authorize documents at a group level.
>>>>
>>>>
>>>> I've also done considerable research on the ClaimSpace security model.
>>>>  Supporting it fully has required some modifications to the basic
>>>> authorization model that ManifoldCF uses to tie documents to
>>>> authorities.  This basic work is done and is now part of trunk as
>>>> well.  And the documentation has been updated to describe the revised
>>>> authorization model.
>>>>
>>>> If you want to try working with the CONNECTORS-754-2 branch, I'd be
>>>> very happy to interact with you to iron out any problems.  What you
>>>> will need to do if you try it out is the following:
>>>>
>>>> (1) Create an authority group for your SharePoint instance
>>>> (2) Create a "SharePoint/Native" authority tied to that authority group
>>>> (3) If this is a claim-space SharePoint instance, then also create a
>>>> "SharePoint/Active Directory" authority tied to the same authority
>>>> group
>>>> (4) Create your SharePoint repository connection, making sure to
>>>> select "Native" mode
>>>>
>>>> The implementation is currently the best I can do in the absence of a
>>>> full-blown Claim Space instance.  Even so, there are still questions
>>>> in my mind that, if I could solve them, would help clarify the
>>>> implementation.  For example, what "Role Definitions" do - are they
>>>> essentially just another form of group?  And, whether it is better to
>>>> use a user, group, or role definition's name for an access token, or
>>>> the ID?  Perhaps you can clarify a bit, I don't know...
>>>>
>>>>
>>>> Thanks,
>>>> Karl
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Nov 6, 2013 at 8:14 AM, Will Parkinson <
>>>> parkinson.will@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Hello,
>>>>
>>>>
>>>> I am just wondering how the extraction of the groups permissions works
>>>> for the sharepoint connector.  From what I can see, it seems that the
>>>> group is determined via the MCPermissions.asmx web service and then
>>>> each user in that group is iterated over and the SID for those users
>>>> are extracted.
>>>>
>>>> Is this the case?  If so, are groups created in Sharepoint and AD
>>>> groups treated the same way?
>>>>
>>>> Cheers,
>>>>
>>>> Will
>>>>
>>>
>>>
>>
>

Mime
View raw message