jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: Searching inside binary contents abd other queries
Date Tue, 10 Jun 2008 13:40:58 GMT
Sergio wrote:
> 1) As our database will be holding most of the data, I thought about the
> following schema: storing the documents inside BLOBs in the database (in
> case we need to access them using some other criteria) AND in Jackrabbit's
> repository. While storing those documents using Jackrabbit, I plan to keep
> the RDBMS' pointers (probably the document's record primary key) using
> properties. The question is: does this make sense? Is it a common practice?
> And if not, what is the standard approach?

well, the recommended approach is to replace your RDBMS with Jackrabbit.

> 2) Do I need to define node types for representing my documents? If not, is
> there some standard type I can use?

for files and folders there's nt:file and nt:folder. See: 
http://wiki.apache.org/jackrabbit/NodeTypeRegistry and of course the JSR 170 

> 3) I have read that Jackrabbit is able to read inside some document types,
> how do you accomplish that? Using TextExtractors?

correct. see: http://jackrabbit.apache.org/jackrabbit-text-extractors.html

> How? Could you point me
> to some examples? I failed to find any. Does it depend on the way I store
> those documents? If so, how do you do it?

the text extractors only work with nt:resource nodes. this means your content 
structure would look like this:

+ my.pdf (nt:file)
   - jcr:created=20080101 (DATE)
   + jcr:content (nt:resource)
     - jcr:mimeType=application/pdf (STRING)
     - jcr:lastModified=20080101 (DATE)
     - jcr:date=<pdf-binary> (BINARY>


View raw message