Return-Path: Delivered-To: apmail-jackrabbit-users-archive@minotaur.apache.org Received: (qmail 53551 invoked from network); 30 Aug 2010 07:52:59 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 Aug 2010 07:52:59 -0000 Received: (qmail 33683 invoked by uid 500); 30 Aug 2010 07:52:59 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 33667 invoked by uid 500); 30 Aug 2010 07:52:57 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 33656 invoked by uid 99); 30 Aug 2010 07:52:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Aug 2010 07:52:56 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of a.schrijvers@1hippo.com designates 64.18.2.177 as permitted sender) Received: from [64.18.2.177] (HELO exprod7og112.obsmtp.com) (64.18.2.177) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 30 Aug 2010 07:52:50 +0000 Received: from source ([209.85.215.47]) by exprod7ob112.postini.com ([64.18.6.12]) with SMTP ID DSNKTHtjPWzHkCNgB6wPQsulkW00UpCNe7qS@postini.com; Mon, 30 Aug 2010 00:52:30 PDT Received: by ewy7 with SMTP id 7so3692775ewy.34 for ; Mon, 30 Aug 2010 00:52:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.213.35.6 with SMTP id n6mr7833448ebd.0.1283154748119; Mon, 30 Aug 2010 00:52:28 -0700 (PDT) Received: by 10.213.104.146 with HTTP; Mon, 30 Aug 2010 00:52:28 -0700 (PDT) In-Reply-To: References: Date: Mon, 30 Aug 2010 09:52:28 +0200 Message-ID: Subject: Re: Searching for binary values From: Ard Schrijvers To: users@jackrabbit.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable 2010/8/30 Slavek Tecl : > > Bloody hotmail, screwed my awesome formatting ;)Hope it's ok now. Hmmmm...not really > > here comes the addBinaryValue method body:...//standard way of indexingSt= ring jcrData =3D mappings.getPrefix(Name.NS_JCR_URI) + ":data";if (jcrData.= equals(fieldName)) { =A0InternalValue type =3D getValue(NameConstants.JCR_M= IMETYPE); =A0 =A0 =A0if (type !=3D null) { =A0 =A0 =A0=A0 =A0Metadata metad= ata =3D new Metadata(); =A0=A0 =A0metadata.set(Metadata.CONTENT_TYPE, type.= getString()); =A0 =A0 =A0 =A0 =A0// jcr:encoding is not mandatory =A0 =A0 = =A0 =A0InternalValue encoding =3D getValue(NameConstants.JCR_ENCODING); =A0= =A0 =A0 =A0 =A0if (encoding !=3D null) { =A0 =A0 =A0 =A0 =A0 =A0 metadata.= set(Metadata.CONTENT_ENCODING, =A0 =A0 =A0 =A0 encoding.getString()); =A0 = =A0 =A0} =A0 =A0=A0 =A0doc.add(createFulltextField(internalValue, metadata)= ); =A0 =A0 =A0}} else { =A0 =A0 =A0 //everything else gets indexed as well = =A0MimeTypes gk =3D new MimeTypes(); MimeType mimeType =3D gk.getMimeType(i= nternalValue.getStream());=A0=A0 =A0 =A0 =A0 =A0Metadata metadata =3D new M= etadata(); =A0 =A0 metadata.set(Metadata.CONTENT_TYPE, mimeType.getName());= =A0 =A0 =A0 =A0doc.add(createFulltextField(internalValue, metadata));}... > > and here we have my custom parser (and I can see it's being started every= time the binary value with my custom mime type is added):XHTMLContentHandle= r xhtml =3D new XHTMLContentHandler(handler, metadata);xhtml.startDocument(= );...fetch keywords...for(String value: keywords) { =A0 =A0 =A0 =A0xhtml.ch= aracters(value); =A0 =A0 =A0 =A0xhtml.characters(" ");}xhtml.endDocument();= ... >> Date: Mon, 30 Aug 2010 09:31:47 +0200 >> Subject: Re: Searching for binary values >> From: a.schrijvers@onehippo.com >> To: users@jackrabbit.apache.org >> >> Slavek, >> >> I am no computer :-) Is there a way you format this is little to human >> understandable kind of thing? >> >> >> 2010/8/30 Slavek Tecl : >>> >>> All right, here comes the addBinaryValue method body: =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 ... =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 //stand= ard way of indexing =A0 =A0 =A0 =A0String jcrData =3D mappings.getPrefix(Na= me.NS_JCR_URI) + ":data"; =A0 =A0 =A0 =A0 if (jcrData.equals(fieldName)) { = =A0 =A0 =A0 =A0 =A0 =A0InternalValue type =3D getValue(NameConstants.JCR_MI= METYPE); =A0 =A0 =A0 =A0 =A0 =A0 =A0if (type !=3D null) { =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 Metadata metadata =3D new Metadata(); =A0 =A0 =A0 =A0 =A0 = =A0 metadata.set(Metadata.CONTENT_TYPE, type.getString()); >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0// jcr:encoding is not mandatory= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0InternalValue encoding =3D getValue(NameCon= stants.JCR_ENCODING); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (encoding !=3D = null) { =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 metadata.set(Metadata.CONTE= NT_ENCODING, =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 encoding.getSt= ring()); =A0 =A0 =A0 =A0 =A0 =A0 =A0} >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0doc.add(createFulltextField(inte= rnalValue, metadata)); =A0 =A0 =A0 =A0 =A0 =A0 =A0} =A0 =A0 =A0 =A0 =A0 } e= lse { =A0 =A0 =A0 =A0 =A0 =A0//everything else gets indexed as well =A0 =A0= =A0 =A0 =A0MimeTypes gk =3D new MimeTypes(); =A0 =A0 =A0 =A0 MimeType mime= Type =3D gk.getMimeType(internalValue.getStream()); >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Metadata metadata =3D new Metadata(); = =A0 =A0 =A0 =A0 =A0 =A0 metadata.set(Metadata.CONTENT_TYPE, mimeType.getNam= e()); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0doc.add(createFulltextField(internalVa= lue, metadata)); =A0 =A0 =A0} =A0 =A0 =A0 =A0 =A0 =A0... >>> my custom parser leverages XMLContentHandler like this (and I can see i= t's being started everytime the binary value with my custom mime type is ad= ded): >>> ...XHTMLContentHandler xhtml =3D new XHTMLContentHandler(handler, metad= ata);xhtml.startDocument();... =A0 =A0 for(String value: keywords) { =A0 = =A0 =A0 =A0 =A0 xhtml.characters(value); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0xht= ml.characters(" "); =A0 =A0 =A0 =A0 =A0//xhtml.element("p", value); =A0 =A0= =A0 =A0}xhtml.endDocument();... >>>> Date: Mon, 30 Aug 2010 09:12:16 +0200 >>>> Subject: Re: Searching for binary values >>>> From: a.schrijvers@onehippo.com >>>> To: users@jackrabbit.apache.org >>>> >>>> 2010/8/27 Slavek Tecl : >>>>> In my case the addBinaryValue has been overriden in my custom class s= o I'm adding this field to the document as well. >>>> >>>> Is it possible that you made some error in this? I can't judge it with= out code >>>> >>>> Regards Ard >>>> >>>>> >>>>>> Date: Fri, 27 Aug 2010 17:16:56 +0200 >>>>>> Subject: Re: Searching for binary values >>>>>> From: a.schrijvers@onehippo.com >>>>>> To: users@jackrabbit.apache.org >>>>>> >>>>>> 2010/8/27 Slavek Tecl : >>>>>>> >>>>>>> I'm looking for a clarification how the query is processed in my cu= stomized jackrabbit instance. In my case the NodeIndexer is subclassed so i= t can add the binary value to the indexed Document even if it does not have= nt:resource type. Then Tika has been customized with my mimetype so the pa= rser is able to recognize the binary stream through it's magic and of cours= e the tika's Parser object was implemented to support the custom binary str= eam to extract words from it.If I run a query on nt:resource nodes it corre= ctly returns files including the searched word as expected but when I invok= e a similar query on a binary property (and the content of this binary prop= erty is exactly the type of the stream Tika can parse) it does not return a= nything - is there a way out? >>>>>> >>>>>> >>>>>> Binary properties are only indexed on nodescope level, not on proper= ty level. >>>>>> >>>>>> See protected void addBinaryValue(Document doc, >>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = String fieldName, >>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = InternalValue internalValue) { >>>>>> >>>>>> and then specifically doc.add(createFulltextField(internalValue, met= adata)); >>>>>> >>>>>> in jr NodeIndexer >>>>>> >>>>>> Regards Ard >>>>> >>> >