jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Klimetschek" <aklim...@day.com>
Subject Re: Can't get Indexing to work
Date Sun, 23 Nov 2008 18:31:09 GMT
Hi,

just guessing: it might be that the PDF extractor doesn't catch all
full-text words in the PDF. Or you added the pdf extractor to the
search index after the file was already added to the repository
(full-text extraction and indexing takes place right after saving).
You can trigger a re-index if you stop the repository, delete the
index subdirectories in your workspace directory (eg.
workspaces/default/index) and restart Jackrabbit.

Regards,
Alex

On Sun, Nov 23, 2008 at 2:22 PM, Thomas Kratz <thomas.kratz@eiswind.de> wrote:
> I ran into another problem now.
>
>
>
> I configured the index as follows:
>
> <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>
>                  <param name="path"
> value="${rep.home}/workspaces/mango/index"/>
>
>                  <param name="useCompoundFile" value="true"/>
>
>                  <param name="minMergeDocs" value="100"/>
>
>                  <param name="volatileIdleTime" value="3"/>
>
>                  <param name="maxMergeDocs" value="100000"/>
>
>                  <param name="mergeFactor" value="10"/>
>
>                  <param name="bufferSize" value="10"/>
>
>                  <param name="cacheSize" value="1000"/>
>
>                  <param name="forceConsistencyCheck" value="false"/>
>
>                  <param name="autoRepair" value="true"/>
>
>                  <param name="analyzer"
>
>
> value="org.apache.lucene.analysis.de.GermanAnalyzer"/>
>
>                        <param name="textFilterClasses"
> value="org.apache.jackrabbit.extractor.MsWordTextExtractor,org.apache.jackra
> bbit.extractor.MsExcelTextExtractor,org.apache.jackrabbit.extractor.MsPowerP
> ointTextExtractor,org.apache.jackrabbit.extractor.PdfTextExtractor,org.apach
> e.jackrabbit.extractor.OpenOfficeTextExtractor,org.apache.jackrabbit.extract
> or.RTFTextExtractor,org.apache.jackrabbit.extractor.HTMLTextExtractor,org.ap
> ache.jackrabbit.extractor.XMLTextExtractor"/>
>
>            </SearchIndex>
>
>
>
> Then I stored a document (nt:file) with mimetype application/pdf and try to
> find it with
>
> //*[jcr:contains(jcr:content, '" + text + "')]
>
>
>
> And I get no results. Again I don't understand what Im getting wrong.
>
>
>
> Seems all difficult to me L
>
>



-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Mime
View raw message