jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: Some problems with similarity search
Date Tue, 24 Nov 2009 14:45:38 GMT
Hi,

the configuration looks good to me.

maybe this has something to do with changes how text is extracted in
jackrabbit. as of 2.0 jackrabbit uses apache tika to extract text from
binary properties. can you please try to increase the
extractorTimeout? it may be that the tika extractor needs more time
than the jackrabbit extractors we previously had in 1.6. fulltext
indexing is deferred when the timeout is reached and therefore the
similarity search may yield no result.

regards
 marcel

On Mon, Nov 16, 2009 at 16:38, roberto.bentivoglio <roberto@exmachina.ch> wrote:
>
> Hi,
> I'm sorry for the duplicated post but in the previous the configuration
> script wasn't readable.
> we have some problems with the similarity search in Jackrabbit 2.0.
> We previously used the version 1.6 without any problem.
> We defined some custom node types:
> exm:document--&gt;exm:content--&gt;jcr:content
> The following query doesn't work after the upgrade to the new version
> (before the query we have added six identical documents into the repository
> each with the same keywords longer than four characters repeated at least
> three times as specified here
> http://wiki.apache.org/jackrabbit/SimilaritySearch ):
>
> "/jcr:root/testsandbox/home//element(*,
> exm:document)[rep:similar(exm:content/jcr:content,
> '/testsandbox/home/test/documents/tes-file.txt/exm:content/jcr:content')]
> order by jcr:score() descending"
>
> We have the configuration described here below:
>
> &lt;Workspace name="${carta.jackrabbit.workspace.name}"&gt;
> ...
> ...
> &lt;SearchIndex
> class="org.apache.jackrabbit.core.query.lucene.SearchIndex"&gt;
>            &lt;param name="analyzer"
> value="org.apache.lucene.analysis.SimpleAnalyzer"/&gt;
>            &lt;param name="path"
> value="${carta.jackrabbit.repository.home}/${carta.jackrabbit.workspace.name}/index"/&gt;
>            &lt;param name="extractorPoolSize" value="0"/&gt;
>            &lt;param name="extractorTimeout" value="100"/&gt;
>            &lt;param name="volatileIdleTime" value="3"/&gt;
>            &lt;param name="maxVolatileIndexSize" value="10485760"/&gt;
>            &lt;param name="supportHighlighting" value="true"/&gt;
>                        &lt;param name="excerptProviderClass"
> value="ch.exm.carta.search.excerpt.HTMLExcerpt"/&gt;
>                        &lt;param name="indexingConfiguration"
> value="${carta.jackrabbit.repository.home}/IndexingConfiguration.xml"/&gt;
>            &lt;param name="spellCheckerClass"
> value="org.apache.jackrabbit.core.query.lucene.spell.LuceneSpellChecker$FiveSecondsRefreshInterval"/&gt;
>        &lt;/SearchIndex&gt;
> ...
> ...
> &lt;/Workspace&gt;
>
> Is the configuration wrong or is a bug of the new version of Jackrabbit?
> Is there a workaround?
>
> Regards,
> Roberto Bentivoglio
> --
> View this message in context: http://n4.nabble.com/Some-problems-with-similarity-search-tp622186p622194.html
> Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
>

Mime
View raw message