Philip Q wrote:
> 1. There are two (major) custom nodetypes in the repository, one
> designed to store plain-text data, the other is designed to store binary
> data (just in properties of those nodetypes).
> Do I need to do anything special to convince Jackrabbit to index these
> nodes? What about processing the binary data with the textfilters?
the binary data must be stored a node of type nt:resource or as a sub
type thereof.
then you need to configure the text filter classes in repository.xml
(and any existing workspace.xml files) that you want to use.
See also:
https://svn.apache.org/repos/asf/jackrabbit/trunk/textfilters/README.txt
> 2. How would I search the binary data? Would a standard XPath query work
> on it as-if it was plain-text (assuming it was an PDF/Word/parseable file)?
once the resource nodes went through the text filters you can search
binary content using the jcr:contains function:
//element(*, nt:resource)[jcr:contains(., 'foo')]
> 3. Since I can't do a full-text query with the JCR/Jackrabbit interface,
> but I assume I can just use Lucene to open the index. If I did, what
> field(s) would I need to query, and what kind of path would I get back?
> (A pointer to the source code that does this would also be very helpful).
JCR *does* provide a way to execute a fulltext query. see above. There
is no need to query the underlying index directly using plain lucene.
regards
marcel
|