jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Slavek Tecl <kin...@hotmail.com>
Subject RE: Searching for binary values
Date Mon, 30 Aug 2010 08:06:43 GMT


 once again in HTML...

 here comes the addBinaryValue method body:...

 
 //standard way of indexing
String jcrData = mappings.getPrefix(Name.NS_JCR_URI) + ":data";
if (jcrData.equals(fieldName)) {
InternalValue type = getValue(NameConstants.JCR_MIMETYPE);
if (type != null) {
    Metadata metadata = new Metadata();
    metadata.set(Metadata.CONTENT_TYPE, type.getString());
    // jcr:encoding is not mandatory
    InternalValue encoding = getValue(NameConstants.JCR_ENCODING);
    if (encoding != null) {
       metadata.set(Metadata.CONTENT_ENCODING, encoding.getString());
   }
   doc.add(createFulltextField(internalValue, metadata));
}
} else {
//everything else gets indexed as well
MimeTypes gk = new MimeTypes();
MimeType mimeType = gk.getMimeType(internalValue.getStream());
      
Metadata metadata = new Metadata();
metadata.set(Metadata.CONTENT_TYPE, mimeType.getName());
doc.add(createFulltextField(internalValue, metadata));
}



and here we have my custom parser (and I can see it's being started everytime the binary value
with my custom mime type is added):

XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
xhtml.startDocument();
...fetch keywords...
for(String value: keywords) { 
xhtml.characters(value); 
xhtml.characters(" ");
}
xhtml.endDocument();
...







----------------------------------------
> Date: Mon, 30 Aug 2010 09:52:28 +0200
> Subject: Re: Searching for binary values
> From: a.schrijvers@onehippo.com
> To: users@jackrabbit.apache.org
>
> 2010/8/30 Slavek Tecl :
> >
> > Bloody hotmail, screwed my awesome formatting ;)Hope it's ok now.
>
> Hmmmm...not really
>
> >
> > here comes the addBinaryValue method body:...//standard way of indexingString jcrData
= mappings.getPrefix(Name.NS_JCR_URI) + ":data";if (jcrData.equals(fieldName)) { InternalValue
type = getValue(NameConstants.JCR_MIMETYPE); if (type != null) { Metadata metadata = new Metadata();
metadata.set(Metadata.CONTENT_TYPE, type.getString()); // jcr:encoding is not mandatory InternalValue
encoding = getValue(NameConstants.JCR_ENCODING); if (encoding != null) { metadata.set(Metadata.CONTENT_ENCODING,
encoding.getString()); } doc.add(createFulltextField(internalValue, metadata)); }} else {
//everything else gets indexed as well MimeTypes gk = new MimeTypes(); MimeType mimeType =
gk.getMimeType(internalValue.getStream()); Metadata metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE,
mimeType.getName()); doc.add(createFulltextField(internalValue, metadata));}...
> >
> > and here we have my custom parser (and I can see it's being started everytime the
binary value with my custom mime type is added):XHTMLContentHandler xhtml = new XHTMLContentHandler(handler,
metadata);xhtml.startDocument();...fetch keywords...for(String value: keywords) { xhtml.characters(value);
xhtml.characters(" ");}xhtml.endDocument();...
> >> Date: Mon, 30 Aug 2010 09:31:47 +0200
> >> Subject: Re: Searching for binary values
> >> From: a.schrijvers@onehippo.com
> >> To: users@jackrabbit.apache.org
> >>
> >> Slavek,
> >>
> >> I am no computer :-) Is there a way you format this is little to human
> >> understandable kind of thing?
> >>
> >>
> >> 2010/8/30 Slavek Tecl :
> >>>
> >>> All right, here comes the addBinaryValue method body: ... //standard way
of indexing String jcrData = mappings.getPrefix(Name.NS_JCR_URI) + ":data"; if (jcrData.equals(fieldName))
{ InternalValue type = getValue(NameConstants.JCR_MIMETYPE); if (type != null) { Metadata
metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE, type.getString());
> >>> // jcr:encoding is not mandatory InternalValue encoding = getValue(NameConstants.JCR_ENCODING);
if (encoding != null) { metadata.set(Metadata.CONTENT_ENCODING, encoding.getString()); }
> >>> doc.add(createFulltextField(internalValue, metadata)); } } else { //everything
else gets indexed as well MimeTypes gk = new MimeTypes(); MimeType mimeType = gk.getMimeType(internalValue.getStream());
> >>> Metadata metadata = new Metadata(); metadata.set(Metadata.CONTENT_TYPE,
mimeType.getName()); doc.add(createFulltextField(internalValue, metadata)); } ...
> >>> my custom parser leverages XMLContentHandler like this (and I can see it's
being started everytime the binary value with my custom mime type is added):
> >>> ...XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);xhtml.startDocument();...
for(String value: keywords) { xhtml.characters(value); xhtml.characters(" "); //xhtml.element("p",
value); }xhtml.endDocument();...
> >>>> Date: Mon, 30 Aug 2010 09:12:16 +0200
> >>>> Subject: Re: Searching for binary values
> >>>> From: a.schrijvers@onehippo.com
> >>>> To: users@jackrabbit.apache.org
> >>>>
> >>>> 2010/8/27 Slavek Tecl :
> >>>>> In my case the addBinaryValue has been overriden in my custom class
so I'm adding this field to the document as well.
> >>>>
> >>>> Is it possible that you made some error in this? I can't judge it without
code
> >>>>
> >>>> Regards Ard
> >>>>
> >>>>>
> >>>>>> Date: Fri, 27 Aug 2010 17:16:56 +0200
> >>>>>> Subject: Re: Searching for binary values
> >>>>>> From: a.schrijvers@onehippo.com
> >>>>>> To: users@jackrabbit.apache.org
> >>>>>>
> >>>>>> 2010/8/27 Slavek Tecl :
> >>>>>>>
> >>>>>>> I'm looking for a clarification how the query is processed
in my customized jackrabbit instance. In my case the NodeIndexer is subclassed so it can add
the binary value to the indexed Document even if it does not have nt:resource type. Then Tika
has been customized with my mimetype so the parser is able to recognize the binary stream
through it's magic and of course the tika's Parser object was implemented to support the custom
binary stream to extract words from it.If I run a query on nt:resource nodes it correctly
returns files including the searched word as expected but when I invoke a similar query on
a binary property (and the content of this binary property is exactly the type of the stream
Tika can parse) it does not return anything - is there a way out?
> >>>>>>
> >>>>>>
> >>>>>> Binary properties are only indexed on nodescope level, not on
property level.
> >>>>>>
> >>>>>> See protected void addBinaryValue(Document doc,
> >>>>>> String fieldName,
> >>>>>> InternalValue internalValue) {
> >>>>>>
> >>>>>> and then specifically doc.add(createFulltextField(internalValue,
metadata));
> >>>>>>
> >>>>>> in jr NodeIndexer
> >>>>>>
> >>>>>> Regards Ard
> >>>>>
> >>>
> >

 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message