jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Allmaker" <nick.allma...@docenterinc.com>
Subject RE: Custom TextExtractor
Date Wed, 22 Aug 2007 15:19:08 GMT
Replies within...

-----Original Message-----
From: Jukka Zitting [mailto:jukka.zitting@gmail.com] 
Sent: Tuesday, August 21, 2007 10:17 AM
To: users@jackrabbit.apache.org
Subject: Re: Custom TextExtractor

> Hi,
> On 8/21/07, Nick Allmaker <nick.allmaker@docenterinc.com> wrote:
> > I'd like to be able to hook up an OCR engine to do full-text search
> > against these images (usually TIFFs), but I'm having issues getting
> > Jackrabbit to pick up my class.
> Are you working with an open source OCR engine? I would be very
> interested in hearing more about your solution.

No, we are using a closed-source OCR engine.  We had already licensed it
and aren't looking to redistribute our application with the OCR feature.
When we have more time to research, we will probably look into
open-source OCR engines so we no longer have to worry about licensing.

> > I've edited the workspace.xml to include my class in the
> > textFilterClasses parameter of the SearchIndex node, added my jar to
> > classpath, deleted the index to force a re-index, and ran a very
> > test.  Yet, when I search for the test text, I get 0 results.
> >
> > Can someone please tell me what I'm doing wrong?
> Have you checked that you've set the jcr:mimeType properties correctly
> on the image nodes?

Yes.  They're "image/tiff."  To test, I added a plaintext file to my
test repository and included the PlainTextExtractor in the
textFilterClasses.  When I queried for items in the text of that file,
the text file returned.

> Otherwise, do you see a warning message being logged about the
> extractor class not being available?

Indeed, I was getting this, mostly due to my inexperience with packaging
.jar files.  Once I overcame this, though, I was still seeing the same
result from the search.  I also added logging to my constructor and
extractText functions, yet saw neither in the resultant log.

Thank you for the guidance thus far.

--Nick Allmaker 

View raw message