jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Klimetschek <aklim...@day.com>
Subject Re: Lucene search should create index of Jackrabbit repository
Date Wed, 12 May 2010 12:46:07 GMT
On Wed, May 12, 2010 at 14:12, Jenni Pothu <Jennip@virtusa.com> wrote:
> Hi Alex,
>        Thanks for the reply and information. It is very useful. Using Jcr:contains
I am able to search on the node content. But I need to search the file content also. It's
not working with Jcr:contains. Thanks again for the needful.

Binary properties of nt:file nodes are full-text extracted with the
help of Apache Tika (since 2.0 [1], before Jackrabbit also had its own
text extractors [2] [3]). The support of files depends on the file
format and whether there is an open source library available that can
handle that format. Some formats such as PDF come in so many varieties
that there are certain issues every now and then.

Also note that large text extractions are queued and the result of it
might not be immediately visible after the save.

[1] http://lucene.apache.org/tika/
[2] http://jackrabbit.apache.org/jackrabbit-text-extractors.html
[3] http://wiki.apache.org/jackrabbit/Search


Alexander Klimetschek

View raw message