jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ross.Dy...@ipaustralia.gov.au
Subject Re: jackrabbit 2.0 binary search indexing [SEC=UNCLASSIFIED]
Date Fri, 19 Feb 2010 00:30:55 GMT
My binary files are all PDFs, so the text is extracted with PdfBox toolkit 
and the full text becomes keyword searchable.
All done using the default configuration, except I extended nt:resource to 
add a few attributes.

The mimeType attribute will be application/octet-stream. 
Perhaps there is no plug-in that knows how to extract text from your 
binary files?

From:   ChadDavis <chadmichaeldavis@gmail.com>
To:     users@jackrabbit.apache.org
Date:   19/02/2010 11:13 AM
Subject:        Re: jackrabbit 2.0 binary search indexing

On Thu, Feb 18, 2010 at 2:39 PM, Alexander Klimetschek <aklimets@day.com> 
> On Thu, Feb 18, 2010 at 18:35, ChadDavis <chadmichaeldavis@gmail.com> 
>> I'm looking for information on how to enable binary search indexing.
>> I found documentation for pre-2.0 jackrabbit, and reference to the
>> fact that Tika is now used internally for the binary indexing.
>> However, I can't find any documentation of how to enable the binary
>> indexing . . ..
> It is enabled for all nt:file binaries, ie. the jcr:content/jcr:data
> property. The mimetype for text extraction is taken from the
> jcr:content/jcr:mimeType property. I don't know if you can enable it
> for other binary properties.

Just to clarify, you are saying that the binary indexing, as long as
I'm using the JCR built-in node types for my binary file storage, e.g.
nt:file --> jcr:content <nt:resource> -->jcr:data ( binary property
with my file ), occurs automatically?

If so, then something's not working for me.  Can you recommend some
troubleshooting tips?  How can I determine whether the binaries are
being indexed?  Note, I'm doing a full text search and it DOES hit
other node properties, etc.

View raw message