jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ross.Dy...@ipaustralia.gov.au
Subject Re: jackrabbit 2.0 binary search indexing [SEC=UNCLASSIFIED]
Date Fri, 19 Feb 2010 00:41:58 GMT
I only have a small dataset in my test application (<100 docs), it 
certainly only takes a few seconds to be available for the keyword search.

ChadDavis <chadmichaeldavis@gmail.com> wrote on 19/02/2010 11:33:27 AM:

> From: ChadDavis <chadmichaeldavis@gmail.com>
> To: users@jackrabbit.apache.org
> Date: 19/02/2010 11:34 AM
> Subject: Re: jackrabbit 2.0 binary search indexing [SEC=UNCLASSIFIED]
> 
> On Thu, Feb 18, 2010 at 5:30 PM,  <Ross.Dyson@ipaustralia.gov.au> wrote:
> > My binary files are all PDFs, so the text is extracted with PdfBox 
toolkit
> > and the full text becomes keyword searchable.
> > All done using the default configuration, except I extended 
nt:resource to
> > add a few attributes.
> >
> > The mimeType attribute will be application/octet-stream.
> > Perhaps there is no plug-in that knows how to extract text from your 
binary
> > files?
> 
> I tried pdf, word, and a plain text file . . . how long does it take
> for a doc to be indexed?
> 
> >
> >
> >
> >
> > From:        ChadDavis <chadmichaeldavis@gmail.com>
> > To:        users@jackrabbit.apache.org
> > Date:        19/02/2010 11:13 AM
> > Subject:        Re: jackrabbit 2.0 binary search indexing
> > ________________________________
> >
> >
> > On Thu, Feb 18, 2010 at 2:39 PM, Alexander Klimetschek 
<aklimets@day.com>
> > wrote:
> >> On Thu, Feb 18, 2010 at 18:35, ChadDavis <chadmichaeldavis@gmail.com>
> >> wrote:
> >>> I'm looking for information on how to enable binary search indexing.
> >>> I found documentation for pre-2.0 jackrabbit, and reference to the
> >>> fact that Tika is now used internally for the binary indexing.
> >>> However, I can't find any documentation of how to enable the binary
> >>> indexing . . ..
> >>
> >> It is enabled for all nt:file binaries, ie. the jcr:content/jcr:data
> >> property. The mimetype for text extraction is taken from the
> >> jcr:content/jcr:mimeType property. I don't know if you can enable it
> >> for other binary properties.
> >>
> >
> > Just to clarify, you are saying that the binary indexing, as long as
> > I'm using the JCR built-in node types for my binary file storage, e.g.
> > nt:file --> jcr:content <nt:resource> -->jcr:data ( binary property
> > with my file ), occurs automatically?
> >
> > If so, then something's not working for me.  Can you recommend some
> > troubleshooting tips?  How can I determine whether the binaries are
> > being indexed?  Note, I'm doing a full text search and it DOES hit
> > other node properties, etc.
> >
> >
> >
> > --
> > This message contains privileged and confidential information only
> > for use by the intended recipient.  If you are not the intended
> > recipient of this message, you must not disseminate, copy or use
> > it in any manner.  If you have received this message in error,
> > please advise the sender by reply e-mail.  Please ensure all
> > e-mail attachments are scanned for viruses prior to opening or
> > using.
> >
> >

Mime
View raw message