Return-Path: Delivered-To: apmail-jackrabbit-users-archive@minotaur.apache.org Received: (qmail 65263 invoked from network); 30 Aug 2010 13:34:21 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 Aug 2010 13:34:21 -0000 Received: (qmail 62835 invoked by uid 500); 30 Aug 2010 13:34:21 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 62513 invoked by uid 500); 30 Aug 2010 13:34:18 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 62504 invoked by uid 99); 30 Aug 2010 13:34:17 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Aug 2010 13:34:17 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of a.schrijvers@1hippo.com designates 64.18.2.161 as permitted sender) Received: from [64.18.2.161] (HELO exprod7og104.obsmtp.com) (64.18.2.161) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 30 Aug 2010 13:33:54 +0000 Received: from source ([209.85.215.173]) by exprod7ob104.postini.com ([64.18.6.12]) with SMTP ID DSNKTHuzLAmTwiI3gVWMXL5LM5IcQbIXpRTq@postini.com; Mon, 30 Aug 2010 06:33:33 PDT Received: by eyf18 with SMTP id 18so3512390eyf.4 for ; Mon, 30 Aug 2010 06:33:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.213.22.18 with SMTP id l18mr8112023ebb.85.1283175211250; Mon, 30 Aug 2010 06:33:31 -0700 (PDT) Received: by 10.213.104.146 with HTTP; Mon, 30 Aug 2010 06:33:31 -0700 (PDT) In-Reply-To: References: Date: Mon, 30 Aug 2010 15:33:31 +0200 Message-ID: Subject: Re: Searching for binary values From: Ard Schrijvers To: users@jackrabbit.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org 2010/8/30 Slavek Tecl : > > =A0Sweet, it works, so the only problem was in my way of adding a propert= y to the lucene Document. > > =A0Many thanks for your help!Cheers,Slavek You're welcome. thx for the feedback, Regards Ard > > > >> From: kink80@hotmail.com >> To: users@jackrabbit.apache.org >> Subject: RE: Searching for binary values >> Date: Mon, 30 Aug 2010 11:20:22 +0200 >> >> >> Hi Ard, >> now I know what the problem is, I just thought that the call to doc.add(= createFullTextField(...)) would >> be sufficient. >> The query I've been using is more complicated but for testing I used rea= lly simple one: >> SELECT child.* FROM [customns:customtype] WHERE CONTAINS(child.binarypro= perty, 'value'). >> For the future, I'm sure the query will contain non-binary properties as= well so a query like >> SELECT child.* FROM [customns:customtype] WHERE CONTAINS(child.binarypro= perty, 'value') OR CONTAINS(child.stringproperty, 'foo') >> would be used too. >> Anyway, thanks for pointing me out the right direction, I'll try to impl= ement the stuff and see if it's working correctly. >> Best regards, >> Slavek >> >> >> > Date: Mon, 30 Aug 2010 10:28:39 +0200 >> > Subject: Re: Searching for binary values >> > From: a.schrijvers@onehippo.com >> > To: users@jackrabbit.apache.org >> > >> > Hello, >> > >> > 2010/8/30 Slavek Tecl : >> > > >> > > >> > > once again in HTML... >> > > >> > > here comes the addBinaryValue method body:... >> > > >> > > >> > > //standard way of indexing >> > > String jcrData =3D mappings.getPrefix(Name.NS_JCR_URI) + ":data"; >> > > if (jcrData.equals(fieldName)) { >> > > InternalValue type =3D getValue(NameConstants.JCR_MIMETYPE); >> > > if (type !=3D null) { >> > > Metadata metadata =3D new Metadata(); >> > > metadata.set(Metadata.CONTENT_TYPE, type.getString()); >> > > // jcr:encoding is not mandatory >> > > InternalValue encoding =3D getValue(NameConstants.JCR_ENCODING); >> > > if (encoding !=3D null) { >> > > metadata.set(Metadata.CONTENT_ENCODING, encoding.getString()); >> > > } >> > > doc.add(createFulltextField(internalValue, metadata)); >> > > } >> > > } else { >> > > //everything else gets indexed as well >> > > MimeTypes gk =3D new MimeTypes(); >> > > MimeType mimeType =3D gk.getMimeType(internalValue.getStream()); >> > > >> > > Metadata metadata =3D new Metadata(); >> > > metadata.set(Metadata.CONTENT_TYPE, mimeType.getName()); >> > > doc.add(createFulltextField(internalValue, metadata)); >> > > } >> > >> > ok, but as I said in one of my earlier mails, binaries are not indexed >> > on property level, only on nodescope level. Your code doesn't index on >> > property lever either, see createFulltextField. I though hope I did >> > understand you first mail correctly: You want to specifically search >> > in the binary property *only* right? You could post me the xpath that >> > you want to be executed. >> > >> > Anyway, >> > >> > you should also add the indexed binary as a non stored (I recommend >> > non stored) property, thus something like the method >> > >> > protected void addStringValue(Document doc, String fieldName, >> > Object internalValue, boolean tokenized, >> > boolean includeInNodeIndex, float boost, >> > boolean useInExcerpt) { >> > >> > does. However, you must realize that binaries get indexed in >> > background lazily by default. I'd recommend to not call >> > >> > doc.add(createFulltextField(internalValue, metadata)); >> > >> > but call >> > >> > doc.add(createFulltextField(fieldName, internalValue, metadata)); >> > >> > add this new createFulltextField method, and create your own >> > LazyTextExtractorField class also having an arg for fieldName. >> > >> > Then, you need to also add the extracted analysed text as a property. >> > >> > Regards Ard >> > >> > > >> > > >> > > >> > > and here we have my custom parser (and I can see it's being started = everytime the binary value with my custom mime type is added): >> > > >> > > XHTMLContentHandler xhtml =3D new XHTMLContentHandler(handler, metad= ata); >> > > xhtml.startDocument(); >> > > ...fetch keywords... >> > > for(String value: keywords) { >> > > xhtml.characters(value); >> > > xhtml.characters(" "); >> > > } >> > > xhtml.endDocument(); >> > > ... >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > ---------------------------------------- >> > >> Date: Mon, 30 Aug 2010 09:52:28 +0200 >> > >> Subject: Re: Searching for binary values >> > >> From: a.schrijvers@onehippo.com >> > >> To: users@jackrabbit.apache.org >> > >> >> > >> 2010/8/30 Slavek Tecl : >> > >> > >> > >> > Bloody hotmail, screwed my awesome formatting ;)Hope it's ok now. >> > >> >> > >> Hmmmm...not really >> > >> >> > >> > >> > >> > here comes the addBinaryValue method body:...//standard way of in= dexingString jcrData =3D mappings.getPrefix(Name.NS_JCR_URI) + ":data";if (= jcrData.equals(fieldName)) { InternalValue type =3D getValue(NameConstants.= JCR_MIMETYPE); if (type !=3D null) { Metadata metadata =3D new Metadata(); = metadata.set(Metadata.CONTENT_TYPE, type.getString()); // jcr:encoding is n= ot mandatory InternalValue encoding =3D getValue(NameConstants.JCR_ENCODING= ); if (encoding !=3D null) { metadata.set(Metadata.CONTENT_ENCODING, encodi= ng.getString()); } doc.add(createFulltextField(internalValue, metadata)); }= } else { //everything else gets indexed as well MimeTypes gk =3D new MimeTy= pes(); MimeType mimeType =3D gk.getMimeType(internalValue.getStream()); Met= adata metadata =3D new Metadata(); metadata.set(Metadata.CONTENT_TYPE, mime= Type.getName()); doc.add(createFulltextField(internalValue, metadata));}... >> > >> > >> > >> > and here we have my custom parser (and I can see it's being start= ed everytime the binary value with my custom mime type is added):XHTMLConte= ntHandler xhtml =3D new XHTMLContentHandler(handler, metadata);xhtml.startD= ocument();...fetch keywords...for(String value: keywords) { xhtml.character= s(value); xhtml.characters(" ");}xhtml.endDocument();... >> > >> >> Date: Mon, 30 Aug 2010 09:31:47 +0200 >> > >> >> Subject: Re: Searching for binary values >> > >> >> From: a.schrijvers@onehippo.com >> > >> >> To: users@jackrabbit.apache.org >> > >> >> >> > >> >> Slavek, >> > >> >> >> > >> >> I am no computer :-) Is there a way you format this is little to= human >> > >> >> understandable kind of thing? >> > >> >> >> > >> >> >> > >> >> 2010/8/30 Slavek Tecl : >> > >> >>> >> > >> >>> All right, here comes the addBinaryValue method body: ... //sta= ndard way of indexing String jcrData =3D mappings.getPrefix(Name.NS_JCR_URI= ) + ":data"; if (jcrData.equals(fieldName)) { InternalValue type =3D getVal= ue(NameConstants.JCR_MIMETYPE); if (type !=3D null) { Metadata metadata =3D= new Metadata(); metadata.set(Metadata.CONTENT_TYPE, type.getString()); >> > >> >>> // jcr:encoding is not mandatory InternalValue encoding =3D get= Value(NameConstants.JCR_ENCODING); if (encoding !=3D null) { metadata.set(M= etadata.CONTENT_ENCODING, encoding.getString()); } >> > >> >>> doc.add(createFulltextField(internalValue, metadata)); } } else= { //everything else gets indexed as well MimeTypes gk =3D new MimeTypes();= MimeType mimeType =3D gk.getMimeType(internalValue.getStream()); >> > >> >>> Metadata metadata =3D new Metadata(); metadata.set(Metadata.CON= TENT_TYPE, mimeType.getName()); doc.add(createFulltextField(internalValue, = metadata)); } ... >> > >> >>> my custom parser leverages XMLContentHandler like this (and I c= an see it's being started everytime the binary value with my custom mime ty= pe is added): >> > >> >>> ...XHTMLContentHandler xhtml =3D new XHTMLContentHandler(handle= r, metadata);xhtml.startDocument();... for(String value: keywords) { xhtml.= characters(value); xhtml.characters(" "); //xhtml.element("p", value); }xht= ml.endDocument();... >> > >> >>>> Date: Mon, 30 Aug 2010 09:12:16 +0200 >> > >> >>>> Subject: Re: Searching for binary values >> > >> >>>> From: a.schrijvers@onehippo.com >> > >> >>>> To: users@jackrabbit.apache.org >> > >> >>>> >> > >> >>>> 2010/8/27 Slavek Tecl : >> > >> >>>>> In my case the addBinaryValue has been overriden in my custom= class so I'm adding this field to the document as well. >> > >> >>>> >> > >> >>>> Is it possible that you made some error in this? I can't judge= it without code >> > >> >>>> >> > >> >>>> Regards Ard >> > >> >>>> >> > >> >>>>> >> > >> >>>>>> Date: Fri, 27 Aug 2010 17:16:56 +0200 >> > >> >>>>>> Subject: Re: Searching for binary values >> > >> >>>>>> From: a.schrijvers@onehippo.com >> > >> >>>>>> To: users@jackrabbit.apache.org >> > >> >>>>>> >> > >> >>>>>> 2010/8/27 Slavek Tecl : >> > >> >>>>>>> >> > >> >>>>>>> I'm looking for a clarification how the query is processed = in my customized jackrabbit instance. In my case the NodeIndexer is subclas= sed so it can add the binary value to the indexed Document even if it does = not have nt:resource type. Then Tika has been customized with my mimetype s= o the parser is able to recognize the binary stream through it's magic and = of course the tika's Parser object was implemented to support the custom bi= nary stream to extract words from it.If I run a query on nt:resource nodes = it correctly returns files including the searched word as expected but when= I invoke a similar query on a binary property (and the content of this bin= ary property is exactly the type of the stream Tika can parse) it does not = return anything - is there a way out? >> > >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> Binary properties are only indexed on nodescope level, not o= n property level. >> > >> >>>>>> >> > >> >>>>>> See protected void addBinaryValue(Document doc, >> > >> >>>>>> String fieldName, >> > >> >>>>>> InternalValue internalValue) { >> > >> >>>>>> >> > >> >>>>>> and then specifically doc.add(createFulltextField(internalVa= lue, metadata)); >> > >> >>>>>> >> > >> >>>>>> in jr NodeIndexer >> > >> >>>>>> >> > >> >>>>>> Regards Ard >> > >> >>>>> >> > >> >>> >> > >> > >> > > >> > > >> >