jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samuel Cox <crankydi...@gmail.com>
Subject Re: jcr:like on long string properties (Jackrabbit 1.6.1)
Date Tue, 29 Mar 2011 19:01:42 GMT
Ok Jukka, I'll give that a try.  I'm assuming what I'd need to do is
send the entire JSON string as one token for Lucene.

Assuming I have anywhere from 100 to 10,000 nodes (10,000 would be
rare) where each node can contain one of these big string properties
(anywhere from 1,000 to 1,000,000 chars), do you think it is feasible
to use Lucene in this fashion?  In other words, will each query
essentially take forever or will the index space be huge?  I figure
asking can only make me look silly:)

Also, I'm pretty sure I understand a lot of why this works how it
does, but it seems like the implementation doesn't meet the spec for
jcr:like.  Am I misreading it?

On 3/29/11, Paco Avila <pavila@openkm.com> wrote:
> OK, I understand. So, the only problem are "very big words". Nice to know :)
> On Tue, Mar 29, 2011 at 3:34 PM, Jukka Zitting <jzitting@adobe.com> wrote:
>> Hi,
>> Paco Avila asked:
>>> this means that we can't index string properties bigger than
>>> 255 characters, isn't it?
>> No, just that a single token (word, number, etc.) won't be included in the
>> index if it's longer than that. Most normal string properties consist of
>> many smaller tokens.
>> If you do have such very long tokens and you need them to be searchable,
>> you can configure Jackrabbit to use a custom analyzer for such properties.
>> See the Index Analyzers section in [1] for more details.
>> [1] http://wiki.apache.org/jackrabbit/IndexingConfiguration
>> --
>> Jukka Zitting
> --
> OpenKM
> http://www.openkm.com

View raw message