xml-xalan-j-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Timothy Jones" <Timothy.Jo...@syniverse.com>
Subject RE: Ignoring errors
Date Tue, 21 Aug 2007 14:58:56 GMT
Hi, Michael - 

Sorry to be contrary, but I don't see it on SF.net.
   http://sourceforge.net/search/?type_of_search=soft&words=neko 

The page I found was at
http://people.apache.org/~andyc/neko/doc/index.html.  It is a personal
page, but on the apache.org site.  Official or not, NEKO did the trick
for me!




tlj
-----Original Message-----
From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com] 
Sent: Tuesday, August 21, 2007 10:47 AM
To: xalan-j-users@xml.apache.org
Cc: Michael Bauer; Dave Brosius; Timothy Jones
Subject: RE: Ignoring errors

NekoHTML is built on top Xerces-J 2.x but it's not an Apache project. I
think Andy Clark (the creator) maintains it in sourceforge these days.

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Timothy Jones" <Timothy.Jones@syniverse.com> wrote on 08/21/2007
10:24:44
AM:

> I once had pretty good success parsing some sloppy HTML right off the 
> web through an HTTP proxy server with a parser called neko.  I can 
> provide code samples off-list if you need them.
>
> It is also an apache offering.
>
>
> Timothy  Jones
>
> Syniverse Technologies
>
> Work
>
> (813) 637-5366
>
> Sr. Systems Engineer
>
> Cell
>
> (813) 857-7650
>
> Development, Tampa, FL
>
>
>
>
> From: Dave Brosius [mailto:DBrosius@Primavera.com]
> Sent: Tuesday, August 21, 2007 9:37 AM
> To: Michael Bauer
> Cc: xalan-j-users@xml.apache.org
> Subject: Re: Ignoring errors

>
> No, but there are various html 'tidying' tools that you could use to 
> preparse the html before passing to the transformer.
>

>
> Michael Bauer <codechimp@gmail.com>
> 08/21/2007 09:33 AM
>
> To
>
> xalan-j-users@xml.apache.org
>
> cc
>
> Subject
>
> Ignoring errors
>
>
>
>
> I am using Xalan/Xerces to parse out some data from a web page.  The 
> problem is that the web page is not well-formed, and running the 
> Transformer on it produces:
> ERROR:  'Open quote is expected for attribute "href|".'
> ERROR:  'com.sun.org.apache.xml.internal.utils.WrappedRuntimeException
> : Open quote is expected for attribute "href|".'
> Is there anyway to instruct the Parse/Transformer to ignore such
errors?


Mime
View raw message