jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Katia Santos" <katiasan...@gmail.com>
Subject Re: Full Text Search Problem
Date Thu, 28 Feb 2008 15:55:57 GMT
It didnt work for me :(
Thanks anyway

I have tried with all mimeTypes....msword...openoffice...and none of my
documents return in the search result!
I really dont know what is missing!
If someone has any idea....

Thanks

On Thu, Feb 28, 2008 at 2:26 PM, Sean Callan <seancallan@gmail.com> wrote:

> Hi Katia,
>
> This was an issue I was wrestling with for some time and I hope that these
> emails will lead to changes on the Jackrabbit website.  There are
> additional
> dependencies not listed for the text extractors.
>
> http://poi.apache.org/
> http://www.pdfbox.org/
> tm-extractors-0.4.jar <http://www.pdfbox.org/tm-extractors-0.4.jar> (lost
> the url)
>
> These are not directly mentioned on the site but mention you should look
> in
> one of the maven jars.  There is absolutely no reason these should not be
> listed as dependencies themselves.
>
> Hope this works for you.
>
> On Thu, Feb 28, 2008 at 8:05 AM, Katia Santos <katiasantos@gmail.com>
> wrote:
>
> > Hello,
> >
> > Im trying to search binary content wiht the following query :
> > //*[jcr:contains(jcr:data,'myword')]  but I dont get any results.
> >
> > I know that my node has to be of type nt:resource, and has to have the
> > properties jcr:data,  jcr:mimeType and jcr:lastModified.
> > Can the ParentNode of this resourceNode be of any type? or it has to
> > be of a specific type??
> >
> > I´m doing something like this:
> >
> > Node parentNode = noActual.addNode(IConstantsEcm.MY_PARENT_NODE);
> > Node childNode= noDocumento.addNode(IConstantsEcm.MY_CHILD_NODE,
> > "nt:resource");
> >                noConteudo.setProperty("jcr:data", binaryData);
> >                noConteudo.setProperty("jcr:mimeType",
> "application/pdf");
> >                Calendar c = new GregorianCalendar();
> >                noConteudo.setProperty("jcr:lastModified", c);
> >
> >
> > When I create the parent node I´m not specifying the type, so it is
> > going to be an unstructured node, is it possible to search with full
> > text in a resource node that its a child of an unstructured node?
> >
> > If it is...please can someone tell me whats missing?
> >
> >
> > In my workspace configuration I have:
> >
> > <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex
> ">
> >         ....
> >        <param name="analyzer"
> > value="org.apache.lucene.analysis.standard.StandardAnalyzer"/>-
> >         <param name="queryClass"
> > value="org.apache.jackrabbit.core.query.QueryImpl"/>
> >         ...
> >          <param name="textFilterClasses"
> > value="org.apache.jackrabbit.core.query.MsExcelTextExtractor,
> > org.apache.jackrabbit.core.query.MsPowerPointTextExtractor,
> > org.apache.jackrabbit.core.query.MsWordTextExtractor,
> > org.apache.jackrabbit.core.query.PdfTextExtractor,
> > org.apache.jackrabbit.core.query.HTMLTextExtractor,
> > org.apache.jackrabbit.core.query.XMLTextExtractor,
> > org.apache.jackrabbit.core.query.RTFTextExtractor,
> > org.apache.jackrabbit.core.query.OpenOfficeTextExtractor"/>
> >         ....
> >  </SearchIndex>
> >
> > maybe i´m missing something!!
> >
> > Thank´s
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message