jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Slavek Tecl <kin...@hotmail.com>
Subject RE: Searching for binary values
Date Mon, 30 Aug 2010 11:58:08 GMT

 Sweet, it works, so the only problem was in my way of adding a property to the lucene Document.

 Many thanks for your help!Cheers,Slavek



> From: kink80@hotmail.com
> To: users@jackrabbit.apache.org
> Subject: RE: Searching for binary values
> Date: Mon, 30 Aug 2010 11:20:22 +0200
> 
> 
> Hi Ard, 
> now I know what the problem is, I just thought that the call to doc.add(createFullTextField(...))
would
> be sufficient.
> The query I've been using is more complicated but for testing I used really simple one:
> SELECT child.* FROM [customns:customtype] WHERE CONTAINS(child.binaryproperty, 'value').
> For the future, I'm sure the query will contain non-binary properties as well so a query
like
> SELECT child.* FROM [customns:customtype] WHERE CONTAINS(child.binaryproperty, 'value')
OR CONTAINS(child.stringproperty, 'foo')
> would be used too.
> Anyway, thanks for pointing me out the right direction, I'll try to implement the stuff
and see if it's working correctly.
> Best regards,
> Slavek
> 
> 
> > Date: Mon, 30 Aug 2010 10:28:39 +0200
> > Subject: Re: Searching for binary values
> > From: a.schrijvers@onehippo.com
> > To: users@jackrabbit.apache.org
> > 
> > Hello,
> > 
> > 2010/8/30 Slavek Tecl <kink80@hotmail.com>:
> > >
> > >
> > > once again in HTML...
> > >
> > > here comes the addBinaryValue method body:...
> > >
> > >
> > > //standard way of indexing
> > > String jcrData = mappings.getPrefix(Name.NS_JCR_URI) + ":data";
> > > if (jcrData.equals(fieldName)) {
> > > InternalValue type = getValue(NameConstants.JCR_MIMETYPE);
> > > if (type != null) {
> > > Metadata metadata = new Metadata();
> > > metadata.set(Metadata.CONTENT_TYPE, type.getString());
> > > // jcr:encoding is not mandatory
> > > InternalValue encoding = getValue(NameConstants.JCR_ENCODING);
> > > if (encoding != null) {
> > > metadata.set(Metadata.CONTENT_ENCODING, encoding.getString());
> > > }
> > > doc.add(createFulltextField(internalValue, metadata));
> > > }
> > > } else {
> > > //everything else gets indexed as well
> > > MimeTypes gk = new MimeTypes();
> > > MimeType mimeType = gk.getMimeType(internalValue.getStream());
> > >
> > > Metadata metadata = new Metadata();
> > > metadata.set(Metadata.CONTENT_TYPE, mimeType.getName());
> > > doc.add(createFulltextField(internalValue, metadata));
> > > }
> > 
> > ok, but as I said in one of my earlier mails, binaries are not indexed
> > on property level, only on nodescope level. Your code doesn't index on
> > property lever either, see createFulltextField. I though hope I did
> > understand you first mail correctly: You want to specifically search
> > in the binary property *only* right? You could post me the xpath that
> > you want to be executed.
> > 
> > Anyway,
> > 
> > you should also add the indexed binary as a non stored (I recommend
> > non stored) property, thus something like the method
> > 
> > protected void addStringValue(Document doc, String fieldName,
> > Object internalValue, boolean tokenized,
> > boolean includeInNodeIndex, float boost,
> > boolean useInExcerpt) {
> > 
> > does. However, you must realize that binaries get indexed in
> > background lazily by default. I'd recommend to not call
> > 
> > doc.add(createFulltextField(internalValue, metadata));
> > 
> > but call
> > 
> > doc.add(createFulltextField(fieldName, internalValue, metadata));
> > 
> > add this new createFulltextField method, and create your own
> > LazyTextExtractorField class also having an arg for fieldName.
> > 
> > Then, you need to also add the extracted analysed text as a property.
> > 
> > Regards Ard
> > 
> > >
> > >
> > >
> > > and here we have my custom parser (and I can see it's being started everytime
the binary value with my custom mime type is added):
> > >
> > > XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
> > > xhtml.startDocument();
> > > ...fetch keywords...
> > > for(String value: keywords) {
> > > xhtml.characters(value);
> > > xhtml.characters(" ");
> > > }
> > > xhtml.endDocument();
> > > ...
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > ----------------------------------------
> > >> Date: Mon, 30 Aug 2010 09:52:28 +0200
> > >> Subject: Re: Searching for binary values
> > >> From: a.schrijvers@onehippo.com
> > >> To: users@jackrabbit.apache.org
> > >>
> > >> 2010/8/30 Slavek Tecl :
> > >> >
> > >> > Bloody hotmail, screwed my awesome formatting ;)Hope it's ok now.
> > >>
> > >> Hmmmm...not really
> > >>
> > >> >
> > >> > here comes the addBinaryValue method body:...//standard way of indexingString
jcrData = mappings.getPrefix(Name.NS_JCR_URI) + ":data";if (jcrData.equals(fieldName)) { InternalValue
type = getValue(NameConstants.JCR_MIMETYPE); if (type != null) { Metadata metadata = new Metadata();
metadata.set(Metadata.CONTENT_TYPE, type.getString()); // jcr:encoding is not mandatory InternalValue
encoding = getValue(NameConstants.JCR_ENCODING); if (encoding != null) { metadata.set(Metadata.CONTENT_ENCODING,
encoding.getString()); } doc.add(createFulltextField(internalValue, metadata)); }} else {
//everything else gets indexed as well MimeTypes gk = new MimeTypes(); MimeType mimeType =
gk.getMimeType(internalValue.getStream()); Metadata metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE,
mimeType.getName()); doc.add(createFulltextField(internalValue, metadata));}...
> > >> >
> > >> > and here we have my custom parser (and I can see it's being started
everytime the binary value with my custom mime type is added):XHTMLContentHandler xhtml =
new XHTMLContentHandler(handler, metadata);xhtml.startDocument();...fetch keywords...for(String
value: keywords) { xhtml.characters(value); xhtml.characters(" ");}xhtml.endDocument();...
> > >> >> Date: Mon, 30 Aug 2010 09:31:47 +0200
> > >> >> Subject: Re: Searching for binary values
> > >> >> From: a.schrijvers@onehippo.com
> > >> >> To: users@jackrabbit.apache.org
> > >> >>
> > >> >> Slavek,
> > >> >>
> > >> >> I am no computer :-) Is there a way you format this is little
to human
> > >> >> understandable kind of thing?
> > >> >>
> > >> >>
> > >> >> 2010/8/30 Slavek Tecl :
> > >> >>>
> > >> >>> All right, here comes the addBinaryValue method body: ...
//standard way of indexing String jcrData = mappings.getPrefix(Name.NS_JCR_URI) + ":data";
if (jcrData.equals(fieldName)) { InternalValue type = getValue(NameConstants.JCR_MIMETYPE);
if (type != null) { Metadata metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE,
type.getString());
> > >> >>> // jcr:encoding is not mandatory InternalValue encoding =
getValue(NameConstants.JCR_ENCODING); if (encoding != null) { metadata.set(Metadata.CONTENT_ENCODING,
encoding.getString()); }
> > >> >>> doc.add(createFulltextField(internalValue, metadata)); } }
else { //everything else gets indexed as well MimeTypes gk = new MimeTypes(); MimeType mimeType
= gk.getMimeType(internalValue.getStream());
> > >> >>> Metadata metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE,
mimeType.getName()); doc.add(createFulltextField(internalValue, metadata)); } ...
> > >> >>> my custom parser leverages XMLContentHandler like this (and
I can see it's being started everytime the binary value with my custom mime type is added):
> > >> >>> ...XHTMLContentHandler xhtml = new XHTMLContentHandler(handler,
metadata);xhtml.startDocument();... for(String value: keywords) { xhtml.characters(value);
xhtml.characters(" "); //xhtml.element("p", value); }xhtml.endDocument();...
> > >> >>>> Date: Mon, 30 Aug 2010 09:12:16 +0200
> > >> >>>> Subject: Re: Searching for binary values
> > >> >>>> From: a.schrijvers@onehippo.com
> > >> >>>> To: users@jackrabbit.apache.org
> > >> >>>>
> > >> >>>> 2010/8/27 Slavek Tecl :
> > >> >>>>> In my case the addBinaryValue has been overriden in
my custom class so I'm adding this field to the document as well.
> > >> >>>>
> > >> >>>> Is it possible that you made some error in this? I can't
judge it without code
> > >> >>>>
> > >> >>>> Regards Ard
> > >> >>>>
> > >> >>>>>
> > >> >>>>>> Date: Fri, 27 Aug 2010 17:16:56 +0200
> > >> >>>>>> Subject: Re: Searching for binary values
> > >> >>>>>> From: a.schrijvers@onehippo.com
> > >> >>>>>> To: users@jackrabbit.apache.org
> > >> >>>>>>
> > >> >>>>>> 2010/8/27 Slavek Tecl :
> > >> >>>>>>>
> > >> >>>>>>> I'm looking for a clarification how the query
is processed in my customized jackrabbit instance. In my case the NodeIndexer is subclassed
so it can add the binary value to the indexed Document even if it does not have nt:resource
type. Then Tika has been customized with my mimetype so the parser is able to recognize the
binary stream through it's magic and of course the tika's Parser object was implemented to
support the custom binary stream to extract words from it.If I run a query on nt:resource
nodes it correctly returns files including the searched word as expected but when I invoke
a similar query on a binary property (and the content of this binary property is exactly the
type of the stream Tika can parse) it does not return anything - is there a way out?
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> Binary properties are only indexed on nodescope
level, not on property level.
> > >> >>>>>>
> > >> >>>>>> See protected void addBinaryValue(Document doc,
> > >> >>>>>> String fieldName,
> > >> >>>>>> InternalValue internalValue) {
> > >> >>>>>>
> > >> >>>>>> and then specifically doc.add(createFulltextField(internalValue,
metadata));
> > >> >>>>>>
> > >> >>>>>> in jr NodeIndexer
> > >> >>>>>>
> > >> >>>>>> Regards Ard
> > >> >>>>>
> > >> >>>
> > >> >
> > >
> > >
> 
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message