jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Allmaker" <nick.allma...@docenterinc.com>
Subject Custom TextExtractor
Date Tue, 21 Aug 2007 14:47:33 GMT
In my repository, I'm storing a variety of files, including some images
of paper documents.  I'd like to be able to hook up an OCR engine to do
full-text search against these images (usually TIFFs), but I'm having
issues getting Jackrabbit to pick up my class.  To ensure that I can get
the system to pick up my class, I've written a simple testing version of
the class for now before actually adding in any OCR.  I've included this
class at the bottom of the e-mail.

I've edited the workspace.xml to include my class in the
textFilterClasses parameter of the SearchIndex node, added my jar to the
classpath, deleted the index to force a re-index, and ran a very simple
test.  Yet, when I search for the test text, I get 0 results.

Can someone please tell me what I'm doing wrong?

Thanks,

--Nick Allmaker

--------ImageTextExtractor.java--------
package test.extractors; 

import java.io.InputStream;
import java.io.Reader;
import java.io.StringReader;
import org.apache.jackrabbit.extractor.AbstractTextExtractor;

public class ImageTextExtractor extends
org.apache.jackrabbit.extractor.AbstractTextExtractor 
{

	public ImageTextExtractor() 
	{
		super(new String[]{"image/tiff", "image/jpeg",
"image/png", "image/gif"});
	}

	public Reader extractText(InputStream stream, String type,
String encoding)
	{
		stream.close();
		return new StringReader("This is a test extraction.");
	}

}

Mime
View raw message