jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Guggisberg" <stefan.guggisb...@gmail.com>
Subject Re: BinaryValue does not get indexed
Date Thu, 19 Apr 2007 08:39:51 GMT
hi phillip,

On 4/18/07, Phillip Rhodes <spamsucks@rhoderunner.com> wrote:
>
> I am adding BinaryValue properties to my nodes.  It appears that jackrabbit is not indexing
the values of the BinaryValue even if the contents represent a string.  If I add the String
value as a StringValue, the value is indexed and picked up in a contains search.
>
> I have 2 issues with this:
>
> 1) String property values have a limit of around 16000 characters because the SimpleDBPersistence
adapter will store the value in a BLOB field.  I get Mysql data truncation errors unless I
chop the data down to 16000 characters.  In addition, I am doubling my space requirements.
 No only do I have to store my binary content, by it's string representation in the node.

there's been a related jira issue:
https://issues.apache.org/jira/browse/JCR-760

the issue has been resolved and will be included in the upcoming 1.3 release.

for the time being you could either change the table defintions
directly on mysql
or change the mysql.ddl file and let jackrabbit recreate the tables.

>
> 2) I use a byte[] array throughout my application has a means to store pdf files, image
files, text files, etc...  It is a "common denominator for all content"  PDF files, image
files, wiki entries, etc...  all can be stored, passed around, retrieved as a byte[] array.
 I would like to figure out how to get jackrabbit to index the byte[] array properly.

see below

>
> 3) Not an issue, but a question. How does jackrabbit know that a node is a pdf document?
 It must figure it out somehow because I see that there is support in the SearchIndex to configure
pdf extractions.  Do I add "jcr:mimeType" property of application/pdf to my pdf node and that
will do it?  Will this solve the first 2 issues??

in order to be indexed the binary data must be stored in the jcr:data
property of a
node of type nt:resource  (or as a sub type thereof).

e.g.

    Node node = parent.addNode("jcr:content", "nt:resource");
    node.setProperty("jcr:mimeType", "application/pdf);
    node.setProperty("jcr:data", new ByteArrayInputStream(pdfBytes));
    node.setProperty("jcr:lastModified", Calendar.getInstance());
    session.save();

once the resource nodes went through the text filters you can search
binary content using the jcr:contains function:

//element(*, nt:resource)[jcr:contains(., 'foo')]

cheers
stefan

>
> I appreciate your thoughts on this!
>
>
> My Code:
>
> String contentText= "this is a unique piece of text";
> byte[] bytes = contentText.getBytes();
> node.setProperty("content", new BinaryValue(bytes));
> if (content.length() > 16000) {
>         contentText= contentText.substring(0, 16000);
> }
> node.setProperty("worksproperty", new StringValue(contentText));
>
>
>
> This is my xpath query:
> //*[jcr:contains(.,'unique')]
>
>
>

Mime
View raw message