jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Callan" <seancal...@gmail.com>
Subject Re: Full Text Search Problem
Date Thu, 28 Feb 2008 14:26:36 GMT
Hi Katia,

This was an issue I was wrestling with for some time and I hope that these
emails will lead to changes on the Jackrabbit website.  There are additional
dependencies not listed for the text extractors.

http://poi.apache.org/
http://www.pdfbox.org/
tm-extractors-0.4.jar (lost the url)

These are not directly mentioned on the site but mention you should look in
one of the maven jars.  There is absolutely no reason these should not be
listed as dependencies themselves.

Hope this works for you.

On Thu, Feb 28, 2008 at 8:05 AM, Katia Santos <katiasantos@gmail.com> wrote:

> Hello,
>
> Im trying to search binary content wiht the following query :
> //*[jcr:contains(jcr:data,'myword')]  but I dont get any results.
>
> I know that my node has to be of type nt:resource, and has to have the
> properties jcr:data,  jcr:mimeType and jcr:lastModified.
> Can the ParentNode of this resourceNode be of any type? or it has to
> be of a specific type??
>
> I´m doing something like this:
>
> Node parentNode = noActual.addNode(IConstantsEcm.MY_PARENT_NODE);
> Node childNode= noDocumento.addNode(IConstantsEcm.MY_CHILD_NODE,
> "nt:resource");
>                noConteudo.setProperty("jcr:data", binaryData);
>                noConteudo.setProperty("jcr:mimeType", "application/pdf");
>                Calendar c = new GregorianCalendar();
>                noConteudo.setProperty("jcr:lastModified", c);
>
>
> When I create the parent node I´m not specifying the type, so it is
> going to be an unstructured node, is it possible to search with full
> text in a resource node that its a child of an unstructured node?
>
> If it is...please can someone tell me whats missing?
>
>
> In my workspace configuration I have:
>
> <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>         ....
>        <param name="analyzer"
> value="org.apache.lucene.analysis.standard.StandardAnalyzer"/>-
>         <param name="queryClass"
> value="org.apache.jackrabbit.core.query.QueryImpl"/>
>         ...
>          <param name="textFilterClasses"
> value="org.apache.jackrabbit.core.query.MsExcelTextExtractor,
> org.apache.jackrabbit.core.query.MsPowerPointTextExtractor,
> org.apache.jackrabbit.core.query.MsWordTextExtractor,
> org.apache.jackrabbit.core.query.PdfTextExtractor,
> org.apache.jackrabbit.core.query.HTMLTextExtractor,
> org.apache.jackrabbit.core.query.XMLTextExtractor,
> org.apache.jackrabbit.core.query.RTFTextExtractor,
> org.apache.jackrabbit.core.query.OpenOfficeTextExtractor"/>
>         ....
>  </SearchIndex>
>
> maybe i´m missing something!!
>
> Thank´s
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message