cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vadim Gritsenko <va...@reverycodes.com>
Subject Re: LuceneIndexContentHandler
Date Thu, 11 Mar 2004 12:23:46 GMT
Joerg Heinicke wrote:

> On 09.03.2004 13:43, Vadim Gritsenko wrote:
>
>>> Yes, I see your objection - and asked for them already in the bug 
>>> http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25934 ;)
>>>
>>> So what are the practical use cases this might occure? Maybe it's 
>>> only a theoretical problem depending on the "thing" the index is 
>>> created from? On which SAX stream the LuceneIndexHandler operates?
>>
>>
>> I remember there were issues already in other components with text 
>> being splitted up onto multiple character events. So, think of this 
>> as of preventive maintenance.
>
>
> Yes, for example this bug: 
> http://nagoya.apache.org/bugzilla/show_bug.cgi?id=26219. Two character 
> events following eachother come out of an XSLT process using 
> <xsl:value> twice following eachother. And the AbstractDOMTransformer 
> or more probable one of the component it uses drops the second and 
> following text events.
>
>>> I also don't get your implications for 
>>> "had_start_or_end_element_in_between_char_events". But I had a look 
>>> on the endElement(). It gets the elements from a stack and already 
>>> tests for text:
>>>     if (text != null && text.length() > 0) {
>>> Would it make sense to add the space in endElement, if the element 
>>> contains text, i.e. the above is true?
>>
>>
>> This was my first though... But then, multiple closing tags will 
>> cause multiple spaces...
>
>
> Ok, if this disturbs.
>
>> So, I thought, this should work:
>>
>> startElement:
>>    flag = true;
>>
>> endElement:
>>    flag = true;
>>
>> characters:
>>    if (flag)
>>        x.append(' ');
>>        flag = false;
>>
>> Does it solves the problem?
>
>
> Unfortunately not:
>
> startElement event
> character event 'key'
> character event 'word'
> character event 'test'
> endElement event
>
> So you would have 'key wordtest'.
>
> What about
>
> characters:
>     flag = true;
>
> endElement:
>     if (flag)
>         x.append(' ');
>         flag = false;
>
> This is similar like the above mentioned endElement text check, but 
> would prevent multiple spaces from output, wouldn't it?


startElement a
character 'key'
startElement b
character 'word'

Will become "keyword" instead of "key word". No, this won't work, again :-)
Addition of

startElement:
    if (flag)
        x.append(' ');
        flag = false;

Should fix it, shouldn't it?

Vadim



Mime
View raw message