jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Wechner <michael.wech...@wyona.com>
Subject Indexing properties with InputStream as value [WAS: Re: Indexing of properties setProperty(String, InputStream)]
Date Mon, 26 Nov 2007 20:38:26 GMT
Marcel Reutegger wrote:

> Hi Michael,
>
> these are rather questions for the user list, but anyway...
>
> Michael Wechner wrote:
>
>> I am using setProperty(String, InputStream) resp. 
>> setProperty("content", new InputStream(...)) in order to save XHTML 
>> and other "bigger" content.
>> Also I am using the TransientRepository implementation.
>>
>> When I am searching with xpath, something like //*[@content] then I 
>> don't receive any results whereas properties being set with 
>> setProperty(String, String) are being found.
>>
>> Now I am very sure the "content" properties do exist, because I read 
>> and write to them without a problem.
>>
>> So my guess is that properties being set through setProperty(String, 
>> InputStream) are not being indexed by default, because it could be 
>> any kind of data, right?
>
>
> that's correct. the JCR specification says that binary properties are 
> not indexed. basically because of the reason you mentioned. it can be 
> anything...
>
>> But I can get them indexed?
>
>
> yes, if you store the binary as a nt:resource node. this will give 
> jackrabbit the required information how to index the binary (mime-type 
> and encoding). furthermore you need to configure text extractors in 
> the configuration. 
> http://jackrabbit.apache.org/doc/components/text-extractors.html


thanks for these pointers. Based on your hints I have also found

http://www.mail-archive.com/dev@jackrabbit.apache.org/msg04145.html

http://repo1.maven.org/maven2/org/apache/jackrabbit/jackrabbit-text-extractors/1.3.3/

http://www.nabble.com/Jackrabbit-performance-with-large-binaries-t2778091.html

whereas I guess it would make sense to combine these into some 
"tutorial" which would be more detailed than the individual resources.

Would you be interested in something like this for the Jackrabbit 
documentation?

Cheers

Michael


>
>> Shall I rather use
>> setProperty(String, Value, int) and set the type to String and use 
>> Value.getStream() ?
>
>
> that's an alternative, but then you will get matches for tag names as 
> well. while you are probably only interested in the text between the 
> elements and attribute values.
>
> regards
>  marcel



-- 
Michael Wechner
Wyona      -   Open Source Content Management   -    Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
michael.wechner@wyona.com                        michi@apache.org
+41 44 272 91 61


Mime
View raw message