jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Moseley <...@osafoundation.org>
Subject Re: handling indexing failure
Date Wed, 11 Jan 2006 17:34:36 GMT
On 1/11/06, Marcel Reutegger <marcel.reutegger@gmx.net> wrote:

> I think it shouldn't be the responsiblity of the index to check for
> validity of data. this should be somewhere further up. either in the
> application that is using the JCR API or the repository may support a
> constraint on a binary property.
>
> e.g. jackrabbit could provide some kind of constraint for binary
> properties that says: this property must contain utf-8 text and conform
> to a certain DTD or XML Schema. well, this is certainly not available at
> the moment, but IMO this is where such a check belongs to.

ah. okay. perhaps you're focusing too hard on the specific example i
gave of my text filter failing. i thought that might be an issue when
i wrote the original message, heh.

yeah, i actually do parse and validate the content at the application
level. what's really happening inside my text filter is that i'm
re-parsing the content, converting it into a different format, and
then using that different format as input for indexing. it's the
second step, the conversion, that failed in the specific case that
prompted this email, not the parsing.

so the issue isn't whether the text filter should be validating
content but rather what do i do when in the course of normal filtering
i come upon a situation that is so horribly wrong (for whatever
reason) that the entire storage operation should be vetoed.

let me give you a second example. that ThreadDeath we're talking about
in another list thread - that occurred in my custom text filter as
well. it didn't cause the server to crash or even for the webdav
request to fail. in fact, the PUT succeeded, but there was no
indication to the user that the text filter had failed and that the
event he just uploaded was not in fact indexed and would not be
included in subsequent queries. so even though we had the ThreadDeath
stack trace in the error log, we spent most of a day putting two and
two together to figure out what was going on. it would have been more
correct and informative for the filter to have vetoed the request or
otherwise informed the jcr client layer that something went really
wrong down below, so that i could have returned an error for the PUT,
and it would have saved me time diagnosing the issue. does that make
sense?

and yes, making filters able to veto a save opens up the potential for
abuse (at least as seen through some eyes), but really, that's the
filter developer's decision to make, is it not? ;)

> not at all. you are using the text filter exactly how I envisioned it.
> using 'virtual properties' ;) that are only available in the index. but
> to me it just doesn't feel right that the index should be responsible
> for checking the validity of content.

in general i agree, and validity checking is certainly intended as
part of my particular filter.

> ah, and also miscommunication from my side... I wrote custom filter, but
> what I was tring to say is custom constraint. JCR allows to define
> constraints on properties. currently only basic constraints are
> specified by jcr-170: value ranges for long properties, etc.

right. custom constraints would definitely be cool. not what i'm
looking for here, but i can certainly see how they'd be useful.

> I agree that we need vetoable listener and will finally implement it for
> jsr-283 but this will look very different from what we currently have
> with the text filters. text filters are not meant to do checks there!
> sorry...

so you're saying that even when you implement vetoable listeners, text
filters won't be vetoable? you'll continue to just log and swallow
potentially very important exceptions that might arise within filters
due to circumstances you can't foresee?

Mime
View raw message