jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Reutegger (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (JCR-264) TextFilters get called three times within checkin() method
Date Fri, 07 Apr 2006 08:51:27 GMT
     [ http://issues.apache.org/jira/browse/JCR-264?page=all ]
Marcel Reutegger resolved JCR-264:

    Fix Version: 1.0.1
                     (was: 1.1)
     Resolution: Fixed

All text filter implementations now use a lazy reader which is initialized when a lucene document
is added to the index. Text filtering has been moved into this initialization block and is
therefore delayed until really needed.

The following sequence will not cause text filtering anymore:
- add nt:resource
- save
- remove nt:resource
- save

The following sequence will cause text filtering before the query can be executed:
- add nt:resource
- save
- execute query (-> will trigger text filtering on nt:resource)
- remove nt:resource
- save

Please note that this fix is backward compatible with the Jackrabbit 1.0 release. You may
use a build with revision 392211of jackrabbit-index-filters with jackrabbit-core-1.0.

Fixed in revision: 392211

> TextFilters get called three times within checkin() method
> ----------------------------------------------------------
>          Key: JCR-264
>          URL: http://issues.apache.org/jira/browse/JCR-264
>      Project: Jackrabbit
>         Type: Improvement

>   Components: indexing
>  Environment: all
>     Reporter: Martin Perez
>      Fix For: 1.0.1

> If you want to add a PDF document to a repository using a PdfTextFilter, and you do the
following steps:
> session.save()
> node.checkin();
> The method PdfTextFilter.doFilter() gets called 4 times!!!
> session's save method calls doFilter one time. This is normal
> But checkin method calls doFilter three times. Is this normal? I do not see the sense.
> ------------------
> Marcel Reutegger 	
> <marcel.reutegger@gmx.net> to jackrabbit-dev
> 	 More options	  11:43 am (13 minutes ago)
> Hi Martin,
> this is unfortunate and should be improved. the reason why this happens
> is the following:
> the search index implementation always indexes a node as a whole to
> improve query performance. that means even if a single property changes
> the parent node with all its properties is re-indexed.
> unfortunately the checkin method sets properties in three separate
> 'transactions', causing the search to re-index the according node three
> times.
> usually this is not an issue, because the index implementation keeps a
> buffer for pending index work. that is, if you change the same property
> several times and save after each setProperty() call, it won't actually
> get re-indexed several times. but text filters behave differently here,
> because they extract the text even though the text will never be used.
> eventually this will improve without any change to the search index
> implementation, because as soon as versioning participates properly in
> transactions there will only be one call to index a node on checkin().
> as a quick fix we could improve the text filter classes to only parse
> the binary when the returned reader is acutally used.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message