manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gonzalez, Pablo" <>
Subject RE: Problem with manifold
Date Wed, 07 Nov 2012 09:20:35 GMT
Hello Karl, this is what I've done:
-I've modified the class so that it prints out the BooleanQuery that it creates.
-I've rerun the query (with my handler), and this is what it pumps out:

+((+allow_token_share:__nosecurity__ +deny_token_share:__nosecurity__)
 allow_token_share:active_dir:S-1-5-32-545 -deny_token_share:active_dir:S-1-5-32-545 
 allow_token_share:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1111 -deny_token_share:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1111

 allow_token_share:active_dir:S-1-5-21-2039231098-2614715072-2050932820-513 -deny_token_share:active_dir:S-1-5-21-2039231098-2614715072-2050932820-513

 allow_token_share:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1113 -deny_token_share:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1113

 allow_token_share:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1110 -deny_token_share:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1110

 allow_token_share:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1107 -deny_token_share:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1107

 allow_token_share:active_dir:S-1-1-0 -deny_token_share:active_dir:S-1-1-0 
 allow_token_share:ad:S-1-5-32-545 -deny_token_share:ad:S-1-5-32-545 
 allow_token_share:ad:S-1-5-21-2039231098-2614715072-2050932820-1111 -deny_token_share:ad:S-1-5-21-2039231098-2614715072-2050932820-1111

 allow_token_share:ad:S-1-5-21-2039231098-2614715072-2050932820-513 -deny_token_share:ad:S-1-5-21-2039231098-2614715072-2050932820-513

 allow_token_share:ad:S-1-5-21-2039231098-2614715072-2050932820-1113 -deny_token_share:ad:S-1-5-21-2039231098-2614715072-2050932820-1113

 allow_token_share:ad:S-1-5-21-2039231098-2614715072-2050932820-1110 -deny_token_share:ad:S-1-5-21-2039231098-2614715072-2050932820-1110

 allow_token_share:ad:S-1-5-21-2039231098-2614715072-2050932820-1107 -deny_token_share:ad:S-1-5-21-2039231098-2614715072-2050932820-1107

 allow_token_share:ad:S-1-1-0 -deny_token_share:ad:S-1-1-0) 
 +((+allow_token_document:__nosecurity__ +deny_token_document:__nosecurity__) 
 allow_token_document:active_dir:S-1-5-32-545 -deny_token_document:active_dir:S-1-5-32-545

 allow_token_document:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1111 -deny_token_document:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1111

 allow_token_document:active_dir:S-1-5-21-2039231098-2614715072-2050932820-513 -deny_token_document:active_dir:S-1-5-21-2039231098-2614715072-2050932820-513

 allow_token_document:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1113 -deny_token_document:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1113

 allow_token_document:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1110 -deny_token_document:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1110

 allow_token_document:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1107 -deny_token_document:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1107

 allow_token_document:active_dir:S-1-1-0 -deny_token_document:active_dir:S-1-1-0 
 allow_token_document:ad:S-1-5-32-545 -deny_token_document:ad:S-1-5-32-545 
 allow_token_document:ad:S-1-5-21-2039231098-2614715072-2050932820-1111 -deny_token_document:ad:S-1-5-21-2039231098-2614715072-2050932820-1111

 allow_token_document:ad:S-1-5-21-2039231098-2614715072-2050932820-513 -deny_token_document:ad:S-1-5-21-2039231098-2614715072-2050932820-513

 allow_token_document:ad:S-1-5-21-2039231098-2614715072-2050932820-1113 -deny_token_document:ad:S-1-5-21-2039231098-2614715072-2050932820-1113

 allow_token_document:ad:S-1-5-21-2039231098-2614715072-2050932820-1110 -deny_token_document:ad:S-1-5-21-2039231098-2614715072-2050932820-1110

 allow_token_document:ad:S-1-5-21-2039231098-2614715072-2050932820-1107 -deny_token_document:ad:S-1-5-21-2039231098-2614715072-2050932820-1107

 allow_token_document:ad:S-1-1-0 -deny_token_document:ad:S-1-1-0)

-I've been trying to understand it, and finally I queried solr (using the default /select
handler) with this:
+((+allow_token_document:__nosecurity__ +deny_token_document:__nosecurity__) 
 allow_token_document:"active_dir:S-1-5-32-545" -deny_token_document:"active_dir:S-1-5-32-545"

 allow_token_document:"active_dir:S-1-5-21-2039231098-2614715072-2050932820-1111" -deny_token_document:"active_dir:S-1-5-21-2039231098-2614715072-2050932820-1111"

 allow_token_document:"active_dir:S-1-5-21-2039231098-2614715072-2050932820-513" -deny_token_document:"active_dir:S-1-5-21-2039231098-2614715072-2050932820-513"

 allow_token_document:"active_dir:S-1-5-21-2039231098-2614715072-2050932820-1113" -deny_token_document:"active_dir:S-1-5-21-2039231098-2614715072-2050932820-1113"

 allow_token_document:"active_dir:S-1-5-21-2039231098-2614715072-2050932820-1110" -deny_token_document:"active_dir:S-1-5-21-2039231098-2614715072-2050932820-1110"

 allow_token_document:"active_dir:S-1-5-21-2039231098-2614715072-2050932820-1107" -deny_token_document:"active_dir:S-1-5-21-2039231098-2614715072-2050932820-1107"

 allow_token_document:"active_dir:S-1-1-0" -deny_token_document:"active_dir:S-1-1-0" 
 allow_token_document:"ad:S-1-5-32-545" -deny_token_document:"ad:S-1-5-32-545"  
 allow_token_document:"ad:S-1-5-21-2039231098-2614715072-2050932820-1111" -deny_token_document:"ad:S-1-5-21-2039231098-2614715072-2050932820-1111"

 allow_token_document:"ad:S-1-5-21-2039231098-2614715072-2050932820-513" -deny_token_document:"ad:S-1-5-21-2039231098-2614715072-2050932820-513"

 allow_token_document:"ad:S-1-5-21-2039231098-2614715072-2050932820-1113" -deny_token_document:"ad:S-1-5-21-2039231098-2614715072-2050932820-1113"

 allow_token_document:"ad:S-1-5-21-2039231098-2614715072-2050932820-1110" -deny_token_document:"ad:S-1-5-21-2039231098-2614715072-2050932820-1110"

 allow_token_document:"ad:S-1-5-21-2039231098-2614715072-2050932820-1107" -deny_token_document:"ad:S-1-5-21-2039231098-2614715072-2050932820-1107"

 allow_token_document:"ad:S-1-1-0" -deny_token_document:"ad:S-1-1-0")

This is the _document security chunk of the BooleanQuery (quoting all the SIDs with  "" so
it doesn't think active_dir is a field only for having a : after it). The query gives the
expected results.

Thinking about it, the truth is that when we configured our security policies by means of
ActiveDirectory we did not take into consideration share-level policies. Our users are authenticated
only at a document level. Anyway, I don't think this gives us any clue on why my handler isn't

But, now I could modify my own component to take care  of the _document-level security alone,
forgetting about the _share-level. I think it would work and that's what I will try for now,
but I seriously think there must be another way to do it, so if this data makes you have any
idea please let me know.

I will anyway tell you whether it worked or not.



-----Original Message-----
From: Karl Wright [] 
Sent: lunes, 05 de noviembre de 2012 11:57
Subject: Re: Problem with manifold

Just reran the tests on the trunk version of the ManifoldCF solr 3.x plugin - looked good:

    [junit] Testsuite: org.apache.solr.mcf.ManifoldCFQParserPluginTest
    [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 10.56 sec
    [junit] ------------- Standard Error -----------------
    [junit] WARNING: test class left thread running: Thread[MultiThreadedHttpCon nectionManager
    [junit] RESOURCE LEAK: test class left 1 thread(s) running
    [junit] ------------- ---------------- ---------------
    [junit] Testsuite: org.apache.solr.mcf.ManifoldCFSearchComponentTest
    [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 2.096 sec
    [junit] ------------- Standard Error -----------------
    [junit] WARNING: test class left thread running: Thread[MultiThreadedHttpCon nectionManager
    [junit] RESOURCE LEAK: test class left 1 thread(s) running
    [junit] ------------- ---------------- ---------------
    [junit] Testsuite: org.apache.solr.mcf.ManifoldCFSCLoadTest
    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 40.486 sec
    [junit] ------------- Standard Output ---------------
    [junit] Query time = 24352
    [junit] ------------- ---------------- ---------------
    [junit] ------------- Standard Error -----------------
    [junit] WARNING: test class left thread running: Thread[MultiThreadedHttpCon nectionManager
    [junit] RESOURCE LEAK: test class left 1 thread(s) running
    [junit] ------------- ---------------- ---------------

The components that this test uses are simple:

<?xml version="1.0" ?>

 Licensed to the Apache Software Foundation (ASF) under one or more  contributor license agreements.
 See the NOTICE file distributed with  this work for additional information regarding copyright
 The ASF licenses this file to You under the Apache License, Version 2.0  (the "License");
you may not use this file except in compliance with  the License.  You may obtain a copy of
the License at

 Unless required by applicable law or agreed to in writing, software  distributed under the
either express or implied.
 See the License for the specific language governing permissions and  limitations under the

<!-- $Id: solrconfig-auth.xml 1176500 2011-09-27 18:19:59Z kwright $


  <jmx />


  <directoryFactory name="DirectoryFactory"

  <updateHandler class="solr.DirectUpdateHandler2">

  <requestHandler name="/update"     class="solr.XmlUpdateRequestHandler" />

  <!-- test MCF Security Filter settings -->
  <searchComponent name="mcf-param"
class="org.apache.solr.mcf.ManifoldCFSearchComponent" >
    <str name="AuthorityServiceBaseURL">http://localhost:8345/mcf-as</str>
    <int name="SocketTimeOut">3000</int>
    <str name="AllowAttributePrefix">aap-</str>
    <str name="DenyAttributePrefix">dap-</str>

  <searchComponent name="mcf"
class="org.apache.solr.mcf.ManifoldCFSearchComponent" >

  <requestHandler name="/mcf" class="solr.SearchHandler" startup="lazy">
    <lst name="invariants">
      <bool name="mcf">true</bool>
    <lst name="defaults">
      <str name="echoParams">all</str>
    <arr name="components">


On Mon, Nov 5, 2012 at 5:42 AM, Karl Wright <> wrote:
> No - I mean modifying ManifoldCFSearchComponent itself, and rebuilding 
> the component yourself.  You can download the sources that correspond 
> to the release from the ManifoldCF download page, 
> .
> Karl
> On Mon, Nov 5, 2012 at 4:13 AM, Gonzalez, Pablo 
> <> wrote:
>> Hello,
>> By 'modifying the component itself' do you mean to write a subclass of ManifoldCFSearchComponent?
>> -----Original Message-----
>> From: Karl Wright []
>> Sent: viernes, 02 de noviembre de 2012 14:47
>> To:
>> Subject: Re: Problem with manifold
>> If you don't get anywhere with the debug component, you can try modifying the component
itself to print the incoming query and the modified query.  You might also want to look at
the ManifoldCF component tests, which create a handler internally and executed successfully
when the component was released.  If you create a similar handler and that works, then you
can try to figure out what the differences are.
>> Thanks,
>> Karl
>> On Fri, Nov 2, 2012 at 8:29 AM, Gonzalez, Pablo <>
>>> Well, it went wrong. I will crawl again just in case, and if it doesn't go well,
I will search on Internet about that debug component you mentioned earlier.
>>> -----Original Message-----
>>> From: Gonzalez, Pablo
>>> Sent: viernes, 02 de noviembre de 2012 12:03
>>> To:
>>> Subject: RE: Problem with manifold
>>> Ok, I already had the fields in my schema.xml. This is the piece of code regarding
>>>    <field name="allow_token_document" type="string" indexed="true"
>>> stored="false" multiValued="true"/>
>>>    <field name="deny_token_document" type="string" indexed="true"
>>> stored="false" multiValued="true"/>
>>>    <field name="allow_token_share" type="string" indexed="true"
>>> stored="false" multiValued="true"/>
>>>    <field name="deny_token_share" type="string" indexed="true"
>>> stored="false" multiValued="true"/>
>>> So, just to make it clear, what you are suggesting is to cut the piece of code
that contains my request handler and paste it in another part of the solrconfig.xml file,
and try this a number of times. I will try to do so, and I'll tell you whether it went right
or wrong.
>>> -----Original Message-----
>>> From: Karl Wright []
>>> Sent: viernes, 02 de noviembre de 2012 11:38
>>> To:
>>> Subject: Re: Problem with manifold
>>> Actually, from your log it is clear that ManifoldCF can be reached fine from
your Solr instance, so please disregard that question.
>>> The only other potential issue has to do with Solr search component ordering.
 This is a bit of black magic, because other Solr components may modify the request in ways
which are potentially incompatible with the ManifoldCF plugin.  So if you are sure your fields
are all correct, you might want to play around with the ordering of your components to see
if that makes any difference.
>>> There used to be debug component you could also use which would print out the
(full) query and the results returned - that may also be useful.
>>> Thanks,
>>> Karl
>>> On Fri, Nov 2, 2012 at 6:25 AM, Karl Wright <> wrote:
>>>> Hi Pablo,
>>>> The first thing that I notice is that, as you have this configured, 
>>>> you need four fields declared in your schema as indexable fields:
>>>> allow_token_document
>>>> deny_token_document
>>>> allow_token_share
>>>> deny_token_share
>>>> Do you have these fields declared, and did you have them all 
>>>> declared when you performed the crawl?
>>>> Second, the way it is configured, the machine that is running Solr 
>>>> must be the same as the machine running ManifoldCF (because you 
>>>> used a localhost url).  Is this true?
>>>> Thanks,
>>>> Karl
>>>> On Fri, Nov 2, 2012 at 5:43 AM, Gonzalez, Pablo 
>>>> <> wrote:
>>>>> Hello, Mr Wright, and thank you for such a fast response. Well, the way
I am using to try and communicate mcf and solr is via a SearchComponent. For this I added
the apache-solr-mcf-3.6-SNAPSHOT.jar that comes in the file solr-integration to the lib folder
of the deployment of the solr webapp in tomcat. Then I changed solrconfig.xml, adding this
piece of code:
>>>>> <!-- LCF document security enforcement component --> 
>>>>> <searchComponent name="mcfSecurity"
>>>>> class="org.apache.solr.mcf.ManifoldCFSearchComponent">
>>>>> <str 
>>>>> name="AuthorityServiceBaseURL">http://localhost:8345/mcf</str>
>>>>> </searchComponent>
>>>>> <requestHandler name="/search" class="solr.SearchHandler"
>>>>> default="true">
>>>>>     <!-- default values for query parameters can be specified, 
>>>>> these
>>>>>          will be overridden by parameters in the request
>>>>>       -->
>>>>>    <!--  <lst name="defaults">
>>>>>        <str name="echoParams">explicit</str>
>>>>>        <int name="rows">10</int>
>>>>>        <str name="df">text</str>
>>>>>      </lst>-->
>>>>> <arr name="last-components">
>>>>> <str>mcfSecurity</str>
>>>>> </arr>
>>>>> <!--a bunch of comments-->
>>>>> </requestHandler>
>>>>> Last thing, I didn't write any additional Java code. I thought it wasn't
>>>>> Thanks,
>>>>> Pablo
>>>>> -----Original Message-----
>>>>> From: Karl Wright []
>>>>> Sent: viernes, 02 de noviembre de 2012 10:21
>>>>> To:
>>>>> Subject: Re: Problem with manifold
>>>>> The ManifoldCF Solr plugin operates by requesting access tokens from
ManifoldCF (which seems to be working fine), and using those to modify the incoming Solr search
expression to limit the results according to those access tokens.
>>>>> There are two ways (and two independent classes) you can configure to
perform this modification.  One of these classes functions as a query parser plugin.  The
other functions as a search component.  Obviously, for either one to work right, the Solr
configuration has to work properly too.  Can you provide details as to (a) which one you are
using, and (b) what the configuration details are, e.g. the appropriate clauses from solrconfig.xml?
>>>>> Thanks,
>>>>> Karl
>>>>> On Fri, Nov 2, 2012 at 4:57 AM, Gonzalez, Pablo <>
>>>>>> Hello,
>>>>>> I don't know if you already got this message, but anyway here I go:
>>>>>> I have been trying to connect ManifoldCF to Solr. I have a file 
>>>>>> system in a remote server, protected by active directory.
>>>>>> I have configured a manifold job to import only a part of the 
>>>>>> documents under the file system. In fact, I do the importing 
>>>>>> process from a file which only contains 2 documents, in order to

>>>>>> make it easier to see what is happening and get conclusions.
>>>>>> Afterwards the documents are output to the solr server.
>>>>>> I have created a request handler called "selectManifold" to "connect"
>>>>>> manifold and solr. Then I call it via 
>>>>>> http://[host]:8080/solr/selectManifold?indent=on&version=2.2&q=*%
>>>>>> 3A
>>>>>> *
>>>>>> &f
>>>>>> q=&start=0&rows=10&fl=*%2Cscore&wt=&explainOther=&hl.fl=&Authenti
>>>>>> ca t ed UserName=user@domain . When doing this, tomcat's log
>>>>>> (catalina.out) writes this:
>>>>>> oct 31, 2012 2:40:33 PM
>>>>>> org.apache.solr.mcf.ManifoldCFSearchComponent
>>>>>> prepare
>>>>>> Información: Trying to match docs for user 'user@domain'
>>>>>> oct 31, 2012 2:40:33 PM
>>>>>> org.apache.solr.mcf.ManifoldCFSearchComponent
>>>>>> getAccessTokens
>>>>>> Información: For user 'user@domain', saw authority response 
>>>>>> AUTHORIZED:Auth+active+directory+para+el+file+system (this one is

>>>>>> the active directory I'm currently using for the job) oct 31, 
>>>>>> 2012
>>>>>> 2:40:33 PM org.apache.solr.mcf.ManifoldCFSearchComponent
>>>>>> getAccessTokens
>>>>>> Información: For user 'user@domain', saw authority response 
>>>>>> AUTHORIZED:ad (this one isn't) oct 31, 2012 2:40:33 PM 
>>>>>> org.apache.solr.core.SolrCore execute
>>>>>> Información: [] webapp=/solr path=/selectManifold 
>>>>>> params={explainOther=&fl=*,score&indent=on&start=0&q=*:*&hl.fl=&w
>>>>>> t= & fq =&version=2.2&rows=10&AuthenticatedUserName=user@domain}
>>>>>> hits=0 status=0 QTime=183
>>>>>> So, it effectively connects and gets my user's tokens. In fact, 
>>>>>> if I go to http://[host]/mcf/UserACLs?username=user@domain, this

>>>>>> is the 
>>>>>> result:AUTHORIZED:Auth+active+directory+para+el+file+system
>>>>>> TOKEN:active_dir:S-1-5-32-545
>>>>>> TOKEN:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1111
>>>>>> TOKEN:active_dir:S-1-5-21-2039231098-2614715072-2050932820-513
>>>>>> TOKEN:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1113
>>>>>> TOKEN:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1110
>>>>>> TOKEN:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1107
>>>>>> TOKEN:active_dir:S-1-1-0
>>>>>> AUTHORIZED:ad
>>>>>> TOKEN:ad:S-1-5-32-545
>>>>>> TOKEN:ad:S-1-5-21-2039231098-2614715072-2050932820-1111
>>>>>> TOKEN:ad:S-1-5-21-2039231098-2614715072-2050932820-513
>>>>>> TOKEN:ad:S-1-5-21-2039231098-2614715072-2050932820-1113
>>>>>> TOKEN:ad:S-1-5-21-2039231098-2614715072-2050932820-1110
>>>>>> TOKEN:ad:S-1-5-21-2039231098-2614715072-2050932820-1107
>>>>>> TOKEN:ad:S-1-1-0
>>>>>> Moreover, if I go to http://[host]:8080/solr/admin/schema.jsp and

>>>>>> search for the allow_token_document field, it says that
>>>>>> active_dir:S-1-5-21-2039231098-2614715072-2050932820-1110
>>>>>> (which appeared in the list of UserACLs) has frequency 2 
>>>>>> (remember I only have 2 documents indexed) And still, when I call

>>>>>> [host]:8080/solr/selectManifold?indent=on&version=2.2&q=*%3A*&fq=&start=0&rows=10&fl=*%2Cscore&wt=&explainOther=&hl.fl=&AuthenticatedUserName=user@domain"
>>>>>> class="external-link"
>>>>>> rel="nofollow">http://[host]:8080/solr/selectManifold?indent=on&v
>>>>>> er
>>>>>> s
>>>>>> io
>>>>>> n=2.2&q=*%3A*&fq=&start=0&rows=10&fl=*%2Cscore&wt=&explainOther=&hl.
>>>>>> fl =&AuthenticatedUserName=user@domain,
>>>>>>  it says no result has been found. Do you know why could it be?
>>>>>> One final thing: when I call
>>>>>> A* & fq =&start=0&rows=10&fl=*%2Cscore&wt=&explainOther=&hl.fl=,
>>>>>> with the default handler (that is, without manifold) , it gives 
>>>>>> me a result with the 2 documents I indexed Sorry for the long 
>>>>>> post but I wanted you to have all the data.
>>>>>> Pablo

View raw message