cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joerg Heinicke <joerg.heini...@gmx.de>
Subject Re: cvs commit: cocoon-2.1 status.xml
Date Tue, 09 Mar 2004 11:01:50 GMT
On 09.03.2004 02:39, Vadim Gritsenko wrote:

>>>       public void characters(char[] ch, int start, int length) {
>>>             if (ch.length > 0 && start >= 0 && length >
1) {
>>>  -            String text = new String(ch, start, length);
>>>               if (elementStack.size() > 0) {
>>>                   IndexHelperField tos = (IndexHelperField) 
>>> elementStack.peek();
>>>  -                tos.appendText(text);
>>>  +                tos.appendText(ch, start, length);
>>>               }
>>>  -            bodyText.append(text);
>>>  +            bodyText.append(' ');
>>>  +            bodyText.append(ch, start, length);
>>>           }
>>>       }
>>>
>>
>> What will happen when "keyword" text is streamed as two characters 
>> events, "key" and "word"? I think it will become "key word", and 
>> indexing will break.
>>
>> IIUC, idea was to add a space in between tags, i.e. so 
>> <p>some</p><p>text</p> is not indexed as "sometext". If that's

>> correct, then better fix would be to add space only if boolean flag 
>> had_start_or_end_element_in_between_char_events set.
> 
> Joerg?

Your mail was neither ignored nor accidently deleted - I just didn't 
know what really to write, but marked it as important in nice red color 
in Mozilla :)

Yes, I see your objection - and asked for them already in the bug 
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25934 ;)

So what are the practical use cases this might occure? Maybe it's only a 
theoretical problem depending on the "thing" the index is created from? 
On which SAX stream the LuceneIndexHandler operates?

I also don't get your implications for 
"had_start_or_end_element_in_between_char_events". But I had a look on 
the endElement(). It gets the elements from a stack and already tests 
for text:
     if (text != null && text.length() > 0) {
Would it make sense to add the space in endElement, if the element 
contains text, i.e. the above is true?

Joerg

Mime
View raw message