jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jaka Jaksic" <jaka.jak...@telemach.net>
Subject RE: Question about full text search together with normal search
Date Thu, 18 Jan 2007 15:31:25 GMT
There are several steps you need to take to ensure proper indexing of file
contents (you may have already done some of them):

1. Ensure that all nt:resource nodes have properly set jcr:mimeType
property, so that the indexing mechanism can recognize indexable content
types and use the appropriate text filter.

2. Configure the Workspace/SearchIndex/textFilterClasses property in
repository.xml, so that it includes text filters for all types you wish to
index, e.g.:
<param name="textFilterClasses"
value="org.apache.jackrabbit.core.query.MsExcelTextFilter,org.apache.jackrab
bit.core.query.MsPowerPointTextFilter,org.apache.jackrabbit.core.query.MsWor
dTextFilter,org.apache.jackrabbit.core.query.PdfTextFilter,org.apache.jackra
bbit.core.query.HTMLTextFilter,org.apache.jackrabbit.core.query.XMLTextFilte
r,org.apache.jackrabbit.core.query.RTFTextFilter,org.apache.jackrabbit.core.
query.OpenOfficeTextFilter"/>

(Note: This only sets the default configuration for new workspaces - it does
not enable indexing in existing workspaces!)

3. Configure text filters same as above in workspace.xml in each of your
existing workspace folders to enable indexing in existing workspaces.

4. Make sure you have all the necessary jars in classpath. Beside
jackrabbit-index-filters-*.jar, most text filters have their own
dependencies. For all of the above filters, you need to have the following
jars: nekohtml-0.9.5.jar, poi-2.5.1-final-20040804.jar, PDFBox-0.7.2.jar,
tm-extractors-0.4.jar. (I'm not sure about this, but if one dependency is
missing, the indexing process seems to fail for other file types too.)

5. Delete the index subfolder in each of your existing workspace folders, so
that the content will be reindexed.

This should do it. Now start the repository application and each workspace
should be reindexed the first time it is opened.


Regards,
Jaka


-----Original Message-----
From: Adamo Bozzetti [mailto:adamo.bozzetti@abfidee.it] 
Sent: Thursday, January 18, 2007 2:15 PM
To: dev@jackrabbit.apache.org
Subject: Question about full text search together with normal search

Hi list,
I'm using jackrabbit for a custom document management. I have defined a node
type that extends nt:file and that have a new propertys such as author.
When i search for attribute or for text there are no problems, but i don't
understand how to put together the two search.
I don't find a way to join a node with it's child.

I also tried to create my node type that extends directly nt:resource but
binary content seems not to be indexed.

Someone has faced out this question?

Thanks in advance
Adamo



Mime
View raw message