jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Torsten Stolpmann <stolpm...@verit.de>
Subject Re: jackrabbit 2.6.0 Full Text Search
Date Fri, 07 Jun 2013 08:15:44 GMT
Hi Orlando,

Did you read the answer from Alexander Klimetschek here: 
http://mail-archives.apache.org/mod_mbox/jackrabbit-users/201207.mbox/%3C9ADA9A33-5AA7-49DB-B32A-AF41610E335D@adobe.com%3E

?

On 27.06.2012, at 17:19, Furst, Carl wrote:

 > > So here's the sql I use:
 > >
 > > select * from [nt:resource] where  contains([jcr:data], 'include');
 >
 > The full text index for binary properties is by default aggregated on 
 > the node itself, not
 > on the jcr:data property. You address that with "*" and you need a 
selector (s in this case):
 >
 > select * from [nt:resource] as s where contains(s.*, 'include')
 >
 > (In the former sql1 you could simply to CONTAINS(., 'include') to > 
adress the node itself).
 >
 > See my recent mail (about xpath, but same index is used): 
http://markmail.org/message/oc6uootrpxepso4d

 > Cheers,
 > Alex

Hope this helps,

Torsten


On 07.06.2013 02:58, Orlando Palis wrote:
> Hi Folks,
>
> I'm new to jackrabbit and I'm trying out full-text search using jackrabbit
> 2.6.0. (with tika 1.3) . I have a custom node type that allows me to store
> some custom properties and multiple html files (stored as binary) .  I have
> the following configurations:
>
> *workspace.xml:*
>
> <?xml version="1.0" encoding="UTF-8"?>
> <Workspace name="default">
>          <!--
>              virtual file system of the workspace:
>              class: FQN of class implementing the FileSystem interface
>          -->
>          <FileSystem
> class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
>              <param name="dataSourceName" value="ds1"/>
>              <param name="schemaObjectPrefix" value="fs_${wsp.name}_"/>
>          </FileSystem>
>          <!--
>              persistence manager of the workspace:
>              class: FQN of class implementing the PersistenceManager
> interface
>          -->
>          <PersistenceManager
> class="org.apache.jackrabbit.core.persistence.pool.OraclePersistenceManager">
>              <param name="dataSourceName" value="ds1"/>
>              <param name="schemaObjectPrefix" value="pm_${wsp.name}_"/>
>          </PersistenceManager>
>          <!--
>              Search index and the file system it uses.
>              class: FQN of class implementing the QueryHandler interface
>          -->
>          <SearchIndex
> class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>              <param name="path" value="${wsp.home}/index"/>
>              <param name="analyzer"
> value="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
>              <param name="queryClass"
> value="org.apache.jackrabbit.core.query.QueryImpl"/>
>              <param name="excerptProviderClass"
> value="org.apache.jackrabbit.core.query.lucene.DefaultHTMLExcerpt"/>
>              <param name="supportHighlighting" value="true"/>
>              <param name="tikaConfigPath"
> value="${wsp.home}/tika-config.xml"/>
>          </SearchIndex>
> </Workspace>
>
>
> *tika-config.xml:*
>
> <?xml version="1.0" encoding="UTF-8"?>
> <properties>
>      <mimeTypeRepository resource="/org/apache/tika/mime/tika-mimetypes.xml"
> magic="false"/>
>      <parsers>
>             <parser name="parse-html"
> class="org.apache.tika.parser.html.HtmlParser">
>                 <mime>text/html</mime>
>                 <mime>application/xhtml+xml</mime>
>                 <mime>application/x-asp</mime>
>             </parser>
>      </parsers>
> </properties>
>
> *JCR-SQL2 queries tested:*
>
> 1) SELECT * FROM [nt:file] as file WHERE CONTAINS(file.*, 'This')
>
> 2) SELECT * FROM [nt:file] as file WHERE CONTAINS(file.*, 'This*')
>
> 3)
> SELECT file.*, resource.* FROM [nt:file] AS file
> INNER JOIN [nt:resource] AS resource ON ISCHILDNODE(resource, file)
> WHERE resource.[jcr:mimeType] = 'text/html'
> AND CONTAINS(file.*, 'This')
>
> 4)
> SELECT file.*, resource.* FROM [nt:file] AS file
> INNER JOIN [nt:resource] AS resource ON ISCHILDNODE(resource, file)
> WHERE resource.[jcr:mimeType] = 'text/html'
> AND CONTAINS(file.*, 'This*')
>
> *Result:*
> Nothing seems to work.  If I remove the CONTAINS() clause from the queries,
> I am able to get rows from all the queries above and for query #3 & #4 I
> can see that the field resource.[jcr:data] has the text ("This") I am
> searching for when I dump the result to the log file.  I've also tried
> deleting the index folder so that the repository will be re-indexed but I
> am still not able to do full-text search successfully.
>
> What am I missing?  In addition, is there any documentation on how to
> configure tika (tika-config.xml)?
>
>
> Thanks and Regards,
> Orlando
>


-- 
Torsten Stolpmann
Geschäftsführender Gesellschafter

verit Informationssysteme GmbH
Europaallee 10
67657 Kaiserslautern

E-Mail: stolpmann@verit.de
Telefon: +49 631 520 840 00
Fax: +49 631 520 840 01
Web: http://www.verit.de/

Registergericht: Amtsgericht Kaiserslautern
Registernummer: HRB 3751
Geschäftsleitung: Claudia Könnecke, Torsten Stolpmann


Mime
View raw message