manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: A hopfully a few simple question about ManifoldCF and SharePoint
Date Thu, 19 Mar 2015 18:26:38 GMT
"So my question is, notwithstanding that this is not the "typical" way
ManifoldCF works, can we use it in the way that I am describing. Is it
malleable enough to work or is it designed to do something so different
from what we need that it would be useless. I guess the key question is
really, can we tell ManifoldCF to limit results to those visible to a
specific user and would there be any performance or other unexpected
downsides to doing that."

Hi Hank,

There is nothing specific about the ManifoldCF *framework* that prevents
you from doing what you suggest.  But there are problems, as follows:

(1) Most out-of-the-box repository connection types, including the
SharePoint type, do not give you any ability to limit crawls to a specific
user.  Instead, because they are intended to support a very different
security model, they fetch a document's access tokens, which are described
by the book chapter I pointed you to.
(2) If you modified the SharePoint repository connection type in the manner
you suggest, you would still need to create a custom output connection type
to drop the content into your per-user database instances.  The alternative
would be to use an appropriate out-of-the-box output connection type, if
there is one, and have N jobs for N users.

Hope that answers your question.

Karl



On Thu, Mar 19, 2015 at 2:15 PM, hank williams <hank777@gmail.com> wrote:

> Thanks Karl.
>
> I will most certainly be reading the document you linked to in great
> detail. It looks like stuff I need to know.
>
> That said, we have a given technology that we have developed and that we
> will be using. It creates a separate index for each user. The technology
> has vastly greater utility than just for sharepoint and Its been in
> development for about six years . (in fact this sharepoint thing is a
> recent add-on request.)
>
> So my question is, notwithstanding that this is not the "typical" way
> ManifoldCF works, can we use it in the way that I am describing. Is it
> malleable enough to work or is it designed to do something so different
> from what we need that it would be useless. I guess the key question is
> really, can we tell ManifoldCF to limit results to those visible to a
> specific user and would there be any performance or other unexpected
> downsides to doing that.
>
> Hank
>
>
> On Thu, Mar 19, 2015 at 1:53 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Hank,
>>
>> "Our project involves a database that has a private secure user space
>> for each user. Our database is built on Lucene and indexes every object in
>> the database. Each user presumably has some number of SharePoint sites that
>> they have access to. We want to index each sharepoint object (file or
>> sharepoint page) as we find it, for each user. The user then ends up with
>> an index of just the objects that they have perrmissions for. But to do
>> that we need to, for each user crawl all of the sharepoint sites that they
>> have access to. Permissions to each sharepoint site are managed by K
>> erberos."
>>
>> This is not the typical ManifoldCF model.  In the typical case, there is
>> ONE lucene search engine (not N), and any searches that take place apply
>> security restrictions internally based on the user's security information,
>> as obtained from the ManifoldCF authority service, which is in turn
>> querying SharePoint.
>>
>> You can read more about the standard authorization setup here:
>>
>>
>> https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs/MCFiA%20CH%2004.pdf
>>
>> Karl
>>
>>
>>
>>
>> On Thu, Mar 19, 2015 at 1:44 PM, hank williams <hank777@gmail.com> wrote:
>>
>>> I am embarking on an effort for which ManifoldCF may  be an appropriate
>>> tool. I am a total noob, having just discovered this project and have a few
>>> questions that I am hoping someone can answer so that I can begin to gain
>>> some confidence about the way things work. Basically I am trying to make
>>> sure I understand, at a top level, how ManifoldCF works.
>>>
>>> Our project involves a database that has a private secure user space for
>>> each user. Our database is built on Lucene and indexes every object in the
>>> database. Each user presumably has some number of SharePoint sites that
>>> they have access to. We want to index each sharepoint object (file or
>>> sharepoint page) as we find it, for each user. The user then ends up with
>>> an index of just the objects that they have perrmissions for. But to do
>>> that we need to, for each user crawl all of the sharepoint sites that they
>>> have access to. Permissions to each sharepoint site are managed by K
>>> erberos.
>>>
>>> So the questions are:
>>>
>>> a. Can I, with ManifoldCF take list of sharepoint sites and a list of
>>> users and relevant Kerberos appropriate authentication tokens or keys (just
>>> learning about Kerberos), and get back a list of indexable objects/URIs
>>> (HTML, .docx, pptx, etc.)?
>>>
>>> b. Is this the right way to think about it?
>>>
>>> c. If so, is there any example code or documentation that would explain
>>> how I do this?
>>>
>>> d. Does manifoldCF provide any information to help indicate whether the
>>> given object has changed, or is that something we need to figure out by
>>> manually comparing the old and new documents in our code?
>>>
>>
>>
>

Mime
View raw message