cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joerg Heinicke <joerg.heini...@gmx.de>
Subject LuceneIndexContentHandler (was: cvs commit: cocoon-2.1 status.xml)
Date Wed, 10 Mar 2004 01:27:24 GMT
On 09.03.2004 13:43, Vadim Gritsenko wrote:

>> Yes, I see your objection - and asked for them already in the bug 
>> http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25934 ;)
>>
>> So what are the practical use cases this might occure? Maybe it's only 
>> a theoretical problem depending on the "thing" the index is created 
>> from? On which SAX stream the LuceneIndexHandler operates?
> 
> I remember there were issues already in other components with text being 
> splitted up onto multiple character events. So, think of this as of 
> preventive maintenance.

Yes, for example this bug: 
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=26219. Two character 
events following eachother come out of an XSLT process using <xsl:value> 
twice following eachother. And the AbstractDOMTransformer or more 
probable one of the component it uses drops the second and following 
text events.

>> I also don't get your implications for 
>> "had_start_or_end_element_in_between_char_events". But I had a look on 
>> the endElement(). It gets the elements from a stack and already tests 
>> for text:
>>     if (text != null && text.length() > 0) {
>> Would it make sense to add the space in endElement, if the element 
>> contains text, i.e. the above is true?
> 
> This was my first though... But then, multiple closing tags will cause 
> multiple spaces...

Ok, if this disturbs.

> So, I thought, this should work:
> 
> startElement:
>    flag = true;
> 
> endElement:
>    flag = true;
> 
> characters:
>    if (flag)
>        x.append(' ');
>        flag = false;
> 
> Does it solves the problem?

Unfortunately not:

startElement event
character event 'key'
character event 'word'
character event 'test'
endElement event

So you would have 'key wordtest'.

What about

characters:
     flag = true;

endElement:
     if (flag)
         x.append(' ');
         flag = false;

This is similar like the above mentioned endElement text check, but 
would prevent multiple spaces from output, wouldn't it?

Joerg

Mime
View raw message