Return-Path: Mailing-List: contact cocoon-dev-help@xml.apache.org; run by ezmlm Delivered-To: mailing list cocoon-dev@xml.apache.org Received: (qmail 53556 invoked from network); 30 Oct 2000 11:34:43 -0000 Received: from web113.mail.yahoo.com (HELO web113.yahoomail.com) (205.180.60.84) by locus.apache.org with SMTP; 30 Oct 2000 11:34:43 -0000 Received: (qmail 9453 invoked by uid 60001); 30 Oct 2000 11:34:42 -0000 Message-ID: <20001030113442.9452.qmail@web113.yahoomail.com> Received: from [141.202.248.56] by web113.yahoomail.com; Mon, 30 Oct 2000 03:34:42 PST Date: Mon, 30 Oct 2000 03:34:42 -0800 (PST) From: Davanum Srinivas Reply-To: dims@yahoo.com Subject: RE: [C2] Generator which can do HTML->XHTML To: cocoon-dev@xml.apache.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Spam-Rating: locus.apache.org 1.6.2 0/1000/N #1: If you want to build HTMLGenerator, then you need to add tidy.jar in xml-cocoon\lib and issue a build request. ANT will check for presence of org.w3c.tidy.Tidy class before it compiles HTMLGenerator.jar. #2: When i tried to use the DOM document returned by parseDOM with DOMStreamer, it did not work :( Hence the need for ByteArrayStream, i did not try to dig deeper. I will revisit this sometime later when they update Tidy. #3: Yes, we should buffer the input stream. I will clean it up and post a patch. Thanks, dims --- Per Kreipke wrote: > Davanum, > > Nicely done. > > Couple of questions (mostly for learning): > > - What will happen with the Tidy.jar file needed to use this generator? Will > it be included in the C2 install? > > - Is the stream returned from Url.openStream() buffered? If so, ok. If not, > why not use a buffered stream since it's a network request: > > Document newdoc = tidy.parseDOM(new BufferedInputStream(url.openStream()), > ostream); > > Only asking because this is what I used under C1. Too bad JTidy can't emit > SAX events (understandably). > > - Why use a ByteArrayStream as the intermediate stream? > > Per. > > > Thanks, applied > > > > Giacomo > > > > Davanum Srinivas wrote: > > > > > > HTMLGenerator can be used to collect content from external web > > sites. It uses JTidy from > > > http://lempinen.net/sami/jtidy/. > > > > > > This was inspired by the following thread > > > http://marc.theaimsgroup.com/?t=97240792100003&w=2&r=1 > > > > > > Here's the generator entry. > > > > src="org.apache.cocoon.generation.HTMLGenerator"/> > > > > > > Here's the pipeline entry. > > > > > > > > > .... > > > .... > > > > > > > > > If you are within a firewall, make sure you set the following > > params in the java.exe command line. > > > > > > java.exe -DproxySet=true -DproxyHost=caproxy.cai.com > > -DproxyPort=80 .... > > > ===== Davanum Srinivas, JNI-FAQ Manager http://www.jGuru.com/faq/JNI __________________________________________________ Do You Yahoo!? Yahoo! Messenger - Talk while you surf! It's FREE. http://im.yahoo.com/