Return-Path: X-Original-To: apmail-manifoldcf-user-archive@www.apache.org Delivered-To: apmail-manifoldcf-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E98E99BA6 for ; Mon, 5 Nov 2012 09:15:02 +0000 (UTC) Received: (qmail 96449 invoked by uid 500); 5 Nov 2012 09:15:02 -0000 Delivered-To: apmail-manifoldcf-user-archive@manifoldcf.apache.org Received: (qmail 96119 invoked by uid 500); 5 Nov 2012 09:14:54 -0000 Mailing-List: contact user-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@manifoldcf.apache.org Delivered-To: mailing list user@manifoldcf.apache.org Received: (qmail 96068 invoked by uid 99); 5 Nov 2012 09:14:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Nov 2012 09:14:52 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [15.192.0.43] (HELO g5t0006.atlanta.hp.com) (15.192.0.43) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Nov 2012 09:14:43 +0000 Received: from G5W2206G.americas.hpqcorp.net (g5w2206g.atlanta.hp.com [16.228.43.185]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by g5t0006.atlanta.hp.com (Postfix) with ESMTPS id E9BF3C209 for ; Mon, 5 Nov 2012 09:14:21 +0000 (UTC) Received: from G5W2791G.americas.hpqcorp.net (16.201.216.225) by G5W2206G.americas.hpqcorp.net (16.228.43.185) with Microsoft SMTP Server (TLS) id 14.2.283.4; Mon, 5 Nov 2012 09:13:51 +0000 Received: from G5W2721.americas.hpqcorp.net ([169.254.12.229]) by G5W2791G.americas.hpqcorp.net ([16.201.216.225]) with mapi id 14.02.0283.004; Mon, 5 Nov 2012 09:13:51 +0000 From: "Gonzalez, Pablo" To: "user@manifoldcf.apache.org" Subject: RE: Problem with manifold Thread-Topic: Problem with manifold Thread-Index: AQHNuQCI3dyiuNCQ+k2mG3t+q/JYGJfa99fQ Date: Mon, 5 Nov 2012 09:13:50 +0000 Message-ID: <98C7E70AAE35D442848A6BFE46F40EDA01FA90F6@G5W2721.americas.hpqcorp.net> References: <98C7E70AAE35D442848A6BFE46F40EDA01FA9089@G5W2721.americas.hpqcorp.net> <98C7E70AAE35D442848A6BFE46F40EDA01FA909E@G5W2721.americas.hpqcorp.net> <98C7E70AAE35D442848A6BFE46F40EDA01FA90BC@G5W2721.americas.hpqcorp.net> <98C7E70AAE35D442848A6BFE46F40EDA01FA90CA@G5W2721.americas.hpqcorp.net> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [16.201.12.10] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Hello, By 'modifying the component itself' do you mean to write a subclass of Mani= foldCFSearchComponent? -----Original Message----- From: Karl Wright [mailto:daddywri@gmail.com]=20 Sent: viernes, 02 de noviembre de 2012 14:47 To: user@manifoldcf.apache.org Subject: Re: Problem with manifold If you don't get anywhere with the debug component, you can try modifying t= he component itself to print the incoming query and the modified query. Yo= u might also want to look at the ManifoldCF component tests, which create a= handler internally and executed successfully when the component was releas= ed. If you create a similar handler and that works, then you can try to fi= gure out what the differences are. Thanks, Karl On Fri, Nov 2, 2012 at 8:29 AM, Gonzalez, Pablo wrote: > Well, it went wrong. I will crawl again just in case, and if it doesn't g= o well, I will search on Internet about that debug component you mentioned = earlier. > > -----Original Message----- > From: Gonzalez, Pablo > Sent: viernes, 02 de noviembre de 2012 12:03 > To: user@manifoldcf.apache.org > Subject: RE: Problem with manifold > > Ok, I already had the fields in my schema.xml. This is the piece of code = regarding them: > > stored=3D"false" multiValued=3D"true"/> > > stored=3D"false" multiValued=3D"true"/> > > stored=3D"false" multiValued=3D"true"/> > > stored=3D"false" multiValued=3D"true"/> > > So, just to make it clear, what you are suggesting is to cut the piece of= code that contains my request handler and paste it in another part of the = solrconfig.xml file, and try this a number of times. I will try to do so, a= nd I'll tell you whether it went right or wrong. > > -----Original Message----- > From: Karl Wright [mailto:daddywri@gmail.com] > Sent: viernes, 02 de noviembre de 2012 11:38 > To: user@manifoldcf.apache.org > Subject: Re: Problem with manifold > > Actually, from your log it is clear that ManifoldCF can be reached fine f= rom your Solr instance, so please disregard that question. > > The only other potential issue has to do with Solr search component order= ing. This is a bit of black magic, because other Solr components may modif= y the request in ways which are potentially incompatible with the ManifoldC= F plugin. So if you are sure your fields are all correct, you might want t= o play around with the ordering of your components to see if that makes any= difference. > > There used to be debug component you could also use which would print out= the (full) query and the results returned - that may also be useful. > > Thanks, > Karl > > On Fri, Nov 2, 2012 at 6:25 AM, Karl Wright wrote: >> Hi Pablo, >> >> The first thing that I notice is that, as you have this configured,=20 >> you need four fields declared in your schema as indexable fields: >> >> allow_token_document >> deny_token_document >> allow_token_share >> deny_token_share >> >> >> Do you have these fields declared, and did you have them all declared=20 >> when you performed the crawl? >> >> Second, the way it is configured, the machine that is running Solr=20 >> must be the same as the machine running ManifoldCF (because you used=20 >> a localhost url). Is this true? >> >> Thanks, >> Karl >> >> >> On Fri, Nov 2, 2012 at 5:43 AM, Gonzalez, Pablo=20 >> wrote: >>> Hello, Mr Wright, and thank you for such a fast response. Well, the way= I am using to try and communicate mcf and solr is via a SearchComponent. F= or this I added the apache-solr-mcf-3.6-SNAPSHOT.jar that comes in the file= solr-integration to the lib folder of the deployment of the solr webapp in= tomcat. Then I changed solrconfig.xml, adding this piece of code: >>> >>> >>> =20 >>> >> class=3D"org.apache.solr.mcf.ManifoldCFSearchComponent"> >>> http://localhost:8345/mcf >>> >>> >>> >>> >> default=3D"true"> >>> >>> >>> >>> >>> >>> >>> mcfSecurity >>> >>> >>> >>> >>> Last thing, I didn't write any additional Java code. I thought it wasn'= t necessary. >>> >>> Thanks, >>> >>> Pablo >>> >>> >>> -----Original Message----- >>> From: Karl Wright [mailto:daddywri@gmail.com] >>> Sent: viernes, 02 de noviembre de 2012 10:21 >>> To: user@manifoldcf.apache.org >>> Subject: Re: Problem with manifold >>> >>> The ManifoldCF Solr plugin operates by requesting access tokens from Ma= nifoldCF (which seems to be working fine), and using those to modify the in= coming Solr search expression to limit the results according to those acces= s tokens. >>> >>> There are two ways (and two independent classes) you can configure to p= erform this modification. One of these classes functions as a query parser= plugin. The other functions as a search component. Obviously, for either= one to work right, the Solr configuration has to work properly too. Can y= ou provide details as to (a) which one you are using, and (b) what the conf= iguration details are, e.g. the appropriate clauses from solrconfig.xml? >>> >>> Thanks, >>> Karl >>> >>> On Fri, Nov 2, 2012 at 4:57 AM, Gonzalez, Pablo wrote: >>>> Hello, >>>> I don't know if you already got this message, but anyway here I go: >>>> I have been trying to connect ManifoldCF to Solr. I have a file=20 >>>> system in a remote server, protected by active directory. >>>> I have configured a manifold job to import only a part of the=20 >>>> documents under the file system. In fact, I do the importing=20 >>>> process from a file which only contains 2 documents, in order to=20 >>>> make it easier to see what is happening and get conclusions.=20 >>>> Afterwards the documents are output to the solr server. >>>> I have created a request handler called "selectManifold" to "connect" >>>> manifold and solr. Then I call it via >>>> http://[host]:8080/solr/selectManifold?indent=3Don&version=3D2.2&q=3D*= %3A >>>> * >>>> &f >>>> q=3D&start=3D0&rows=3D10&fl=3D*%2Cscore&wt=3D&explainOther=3D&hl.fl=3D= &Authentica >>>> t ed UserName=3Duser@domain . When doing this, tomcat's log >>>> (catalina.out) writes this: >>>> oct 31, 2012 2:40:33 PM >>>> org.apache.solr.mcf.ManifoldCFSearchComponent >>>> prepare >>>> Informaci=F3n: Trying to match docs for user 'user@domain' >>>> oct 31, 2012 2:40:33 PM >>>> org.apache.solr.mcf.ManifoldCFSearchComponent >>>> getAccessTokens >>>> Informaci=F3n: For user 'user@domain', saw authority response=20 >>>> AUTHORIZED:Auth+active+directory+para+el+file+system (this one is=20 >>>> the active directory I'm currently using for the job) oct 31, 2012 >>>> 2:40:33 PM org.apache.solr.mcf.ManifoldCFSearchComponent >>>> getAccessTokens >>>> Informaci=F3n: For user 'user@domain', saw authority response=20 >>>> AUTHORIZED:ad (this one isn't) oct 31, 2012 2:40:33 PM=20 >>>> org.apache.solr.core.SolrCore execute >>>> Informaci=F3n: [] webapp=3D/solr path=3D/selectManifold=20 >>>> params=3D{explainOther=3D&fl=3D*,score&indent=3Don&start=3D0&q=3D*:*&h= l.fl=3D&wt=3D >>>> & fq =3D&version=3D2.2&rows=3D10&AuthenticatedUserName=3Duser@domain} >>>> hits=3D0 status=3D0 QTime=3D183 >>>> So, it effectively connects and gets my user's tokens. In fact, if=20 >>>> I go to http://[host]/mcf/UserACLs?username=3Duser@domain, this is=20 >>>> the result:AUTHORIZED:Auth+active+directory+para+el+file+system >>>> TOKEN:active_dir:S-1-5-32-545 >>>> TOKEN:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1111 >>>> TOKEN:active_dir:S-1-5-21-2039231098-2614715072-2050932820-513 >>>> TOKEN:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1113 >>>> TOKEN:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1110 >>>> TOKEN:active_dir:S-1-5-21-2039231098-2614715072-2050932820-1107 >>>> TOKEN:active_dir:S-1-1-0 >>>> AUTHORIZED:ad >>>> TOKEN:ad:S-1-5-32-545 >>>> TOKEN:ad:S-1-5-21-2039231098-2614715072-2050932820-1111 >>>> TOKEN:ad:S-1-5-21-2039231098-2614715072-2050932820-513 >>>> TOKEN:ad:S-1-5-21-2039231098-2614715072-2050932820-1113 >>>> TOKEN:ad:S-1-5-21-2039231098-2614715072-2050932820-1110 >>>> TOKEN:ad:S-1-5-21-2039231098-2614715072-2050932820-1107 >>>> TOKEN:ad:S-1-1-0 >>>> Moreover, if I go to http://[host]:8080/solr/admin/schema.jsp and=20 >>>> search for the allow_token_document field, it says that >>>> active_dir:S-1-5-21-2039231098-2614715072-2050932820-1110 >>>> (which appeared in the list of UserACLs) has frequency 2 (remember=20 >>>> I only have 2 documents indexed) And still, when I call=20 >>>> [host]:8080/solr/selectManifold?indent=3Don&version=3D2.2&q=3D*%3A*&fq= =3D&start=3D0&rows=3D10&fl=3D*%2Cscore&wt=3D&explainOther=3D&hl.fl=3D&Authe= nticatedUserName=3Duser@domain" >>>> class=3D"external-link" >>>> rel=3D"nofollow">http://[host]:8080/solr/selectManifold?indent=3Don&ve= r >>>> s >>>> io >>>> n=3D2.2&q=3D*%3A*&fq=3D&start=3D0&rows=3D10&fl=3D*%2Cscore&wt=3D&expla= inOther=3D&hl. >>>> fl =3D&AuthenticatedUserName=3Duser@domain, >>>> it says no result has been found. Do you know why could it be? >>>> One final thing: when I call >>>> http://130.177.44.21:8080/solr/select?indent=3Don&version=3D2.2&q=3D*%= 3A* >>>> & fq =3D&start=3D0&rows=3D10&fl=3D*%2Cscore&wt=3D&explainOther=3D&hl.f= l=3D, >>>> with the default handler (that is, without manifold) , it gives me=20 >>>> a result with the 2 documents I indexed Sorry for the long post but=20 >>>> I wanted you to have all the data. >>>> Pablo >>>> >>>>