jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kurz Wolfgang <wolfgang.k...@gwvs.de>
Subject AW: Problem getting full textual search to work with textextractors
Date Thu, 26 Mar 2009 15:31:58 GMT

Oh i forgot to mention that i included 

lucene-core-2.3.2.jar

and in my workspace:

<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
            <param name="path" value="${wsp.home}/index"/>
            <param name="textFilterClasses" value="org.apache.jackrabbit.extractor.PlainTextExtractor,org.apache.jackrabbit.extractor.MsWordTextExtractor,org.apache.jackrabbit.extractor.MsExcelTextExtractor,org.apache.jackrabbit.extractor.MsPowerPointTextExtractor,org.apache.jackrabbit.extractor.PdfTextExtractor,org.apache.jackrabbit.extractor.OpenOfficeTextExtractor,org.apache.jackrabbit.extractor.RTFTextExtractor,org.apache.jackrabbit.extractor.HTMLTextExtractor,org.apache.jackrabbit.extractor.XMLTextExtractor"/>
            <param name="extractorPoolSize" value="2"/>
            <param name="supportHighlighting" value="true"/>
        </SearchIndex>



-----Ursprüngliche Nachricht-----
Von: Kurz Wolfgang 
Gesendet: Donnerstag, 26. März 2009 16:20
An: 'users@jackrabbit.apache.org'
Betreff: Problem getting full textual search to work with textextractors

Hello everyone,

i am trying to get the full textual search to work with text extractors.


I uploaded a pfd-file as resource into jackrabbit which works fine as I can download it just
fine and I get the file back.

But now I wanted to implement textual search inside document I uploaded and somehow it doesn't
find the documents even though the document contains the strings that I am searching for.

What I did I this:

I added these jar files to my tomcat server lib folder since I am using JNDI to connect

-jackrabbit-text-extractors-1.5.0.jar
-fontbox-0.1.0.jar
-junit-3.8.1.jar
-nekohtml-1.9.7.jar
-pdfbox-0.7.3.jar
-poi-3.0.2-FINAL.jar
-poi-scratchpad-3.0.2-FINAL.jar
-tm-extractors-0.4.jar

Then my x-path query looks like this:

//*[((jcr:contains(.,'consetetur')) or (jcr:contains(.,'sadipscing')))]

Both of those words are inside the pdf but the search result is empty.

Here is the code how I do the search:

javax.jcr.query.Query jcrQuery;
		try {
			jcrQuery = session.getWorkspace().getQueryManager().createQuery(query, language);
			QueryResult queryResult = jcrQuery.execute();
			NodeIterator nodeIterator = queryResult.getNodes();
			return nodeIterator;
		}
		catch (InvalidQueryException iqe) {
			throw new org.apache.jackrabbit.ocm.exception.InvalidQueryException(iqe);
		}
		catch (RepositoryException re) {
			throw new ObjectContentManagerException(re.getMessage(), re);
		}


Would be really awesome if anyone had an idea for me why this doesn't work

Thx a lot in advance
Wolfgang

Mime
View raw message