subversion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhuvaneswaran A <bhu...@collab.net>
Subject Re: [PATCH] Handle special characters in Junit report
Date Fri, 04 Dec 2009 12:19:34 GMT
Please find attached the revised patch. I incorporated following
feedback:
  a) Fix the array slicing part
  b) Escape using ord() instead of removing those characters
  c) Handle "]]>" in CDATA section
  d) Define the ascii table globally (once) and re-use

I also verified this fix by generating the junit files for tests having
special characters and simulating a test that has "]]>" in failure text.
With this patch, it generates valid junit file.

[[
Fix for issue 3541. When generating junit reports, escape special
characters, if any, from test failure messages.

* tools/dev/gen_junit_report.py
  (escape_special_characters): New method to escape special characters.
  (junit_testcase_fail, junit_testcase_xfail): Use 
  escape_special_characters() method to escape special characters in 
  test failure messages.
]]
On Fri, 2009-12-04 at 11:19 +0000, Julian Foad wrote:
> Bhuvaneswaran A wrote:
> > On Fri, 2009-12-04 at 10:22 +0000, Julian Foad wrote:
> > > Branko Čibej wrote:
> > > > Bhuvaneswaran A wrote:
> > > > > The failure message for few tests contain special characters, ex:
> > > 
> > > What do you mean by "special" characters? Unprintable characters?
> > > Non-UTF8 characters? Invalid XML characters? Characters that are XML
> > > syntax characters such as "<"?
> > 
> > When I mean special characters, I mean control characters, ex "^H".
> > Refer to the attachment in issue 3541 for a sample character.
> > 
> > > > > prop_tests.py. As a result, it creates an invalid xml file and not
being
> > > > > displayed in Hudson.
> > > > >
> > > > > This commit fixes this issue, also tracked in issue 3541. With this
fix,
> > > > > the test results are displayed in Hudson, especially the results
> > > > > specific to 1.6.x solaris build.
> > > > >   http://subversion.tigris.org/issues/show_bug.cgi?id=3541
> > > > >   
> > > > > Index: tools/dev/gen_junit_report.py
> > > > > =======================================
> > > > > --- tools/dev/gen_junit_report.py	(revision 886204)
> > > > > +++ tools/dev/gen_junit_report.py	(working copy)
> > > > > @@ -46,6 +46,16 @@
> > > > > data = data.replace(char, encode[char])
> > > > > return data
> > > > > +def remove_special_characters(data):
> > > > > + """remove special characters in test failure reasons"""
> > > > > + if not data:
> > > > > + return data
> > > > > + chars_table = "".join([chr(n) for n in xrange(256)])
> > > > > + # remove all special characters upto ascii value 31, except line
> > > > > feed (10)
> > > > > + # and carriage return (13)
> > > > > + chars_to_remove = chars_table[0:9] + chars_table[11:12] +
> > > > > chars_table[14:31]
> > > 
> > > Isn't the indexing off by one? Should be [0:10] ... [11:13] ... [14:32].
> > 
> > As per the comment, I wanted to preserve LF (10) and CR (13). [0:9] ...
> > [11:12] ... [14:31] works for me.
> 
> In array slicing, the range start is inclusive but the end is exclusive,
> so [11:12] just gives the single character [11], whereas [11:13] would
> give [11] and [12] which is what you want.
> 
> > > > Also, wouldn't it be more proper to find out why the tests put control
> > > > characters in the failure description than to just blindly throw them
away?
> > > 
> > > Or just escape the "special" characters.
> > 
> > Good point. I used to encode using utf-8, but it doesn't seem to
> > detect/encode these characters, resulting in unchanged behaviour. I used
> > something like:
> > 
> > reason = u'%s'.encode('utf-8') % reason
> > reason = unicode(reason, 'utf-8')
> 
> Those statements are just converting from UTF-8 to internal Unicode
> representation, not doing any escaping. Control characters (and all
> characters) are valid in both UTF-8 and internal Unicode
> representations, so no escaping is needed for the conversion.
> 
> I searched on the web and didn't find a really really simple way to
> escape a set of characters. I think something like
> 
>   for c in chars_to_remove:
>     data = data.replace(c, '%%%0x' % ord(c))
> 
> would do it.
> 
> - Julian
> 
> 


-- 
Bhuvaneswaran A    
CollabNet Software P Ltd.  |  www.collab.net

Mime
View raw message