lenya-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Florent André <florent.andre-...@4sengines.com>
Subject Re: [Feedmodule] How to declare an entity in a Java transformer ?
Date Fri, 13 Mar 2009 14:24:04 GMT
Simpler is better !

After some broken keyboard, I see the cocoon htmltransformer.. and this
make me as "I saw an angel" ! :)

If you want to download and transform a large possibility of web pages (url
with ?,& ; page with frameset, or no </img> (!)), you can do that : 

--- a sources.xsl :
<escaped-html>
<i:include parse="text" src="http://www.adress" />
</escaped-html>

--- in sitemap.xmap

* in :
 <map:components>
    <map:transformers default="xslt">

ADD :
    <map:transformer
      name="html"
      logger="sitemap.transformer.html"
      src="org.apache.cocoon.transformation.HTMLTransformer">
      <!-- Tidy configuration file -->
     
<jtidy-config>fallback://lenya/modules/fckeditor/config/jtidy.properties</jtidy-config>
    </map:transformer>

* in : 
 <map:pipelines>

    <map:pipeline type="noncaching">

      <map:match pattern="XXXXXX">

ADD
   
       <map:generate src="test/sources.xml"/>

       <map:transform type="include"/>

       <map:transform type="html">
           <map:parameter name="tags" value="escaped-html"/>
       </map:transform>




And now... go to work for my boss ! :p)

Have a good WE

On Fri, 13 Mar 2009 11:32:12 +0100, Florent André
<florent.andre-dev@4sengines.com> wrote:
> Hi Lenya's friend
> 
> On Thu, 20 Nov 2008 22:10:05 +0100, Andreas Hartmann <andreas@apache.org>
> wrote:
>> Hi André,
>> 
>> Florent André schrieb:
>>> thanks for this pointer !
>>> 
>>> HtmlGenerator works like a charm !
>>> 
>>> But, I try to call this htmlgenerator in a xinclude... and it's don't
>>> work
>>> ! :(
>> 
>> does it work with the IncludeTransformer?
>> 
>>
>
http://cocoon.apache.org/2.1/apidocs/org/apache/cocoon/transformation/IncludeTransformer.html
>> 
>> -- Andreas
>> 
> 
> Thanks Andreas, it work with include... but just for "simple" www adress
> (without ? and &).
> 
> I solved the problem of ? with a "bidouille" (~= tricks) :
> -------- prepareinclude.xsl : 
> * replace with a regex the ? by /post--parameter/
> * create <include
>
src="cocoon://module/webagent/retrivepipe/www/without/http/post--parameter/parameters
> 
> --------- webagent's sitemap.xmap
> * <map:match="retrivepipe/**/post-parameter/**/">
> *    <map:generate src="http://{1}/post-parameter/{2} type="html"/> //
call
> to HTMLGenerator
> * ...
> * </map:match>
> 
> 
> But I don't find any other solution for the & : 
> - this character was translate into & in my xslt, and htmlgenerator
> don't do the & ==> & transformation...
> 
> Do you have a suggestion ? 
> 
> 
> Have a good day
> 
> 
> 
>>> 
>>> I try : 
>>> <xi:include href="cocoon:/retrive/web/adress/without/http://"
>>> and 
>>> <xi:include href="cocoon://retrive/web/adress/without/http://"
>>> 
>>> But none of this work.
>>> 
>>> The log4j says : 
>>> * java.io.FileNotFoundException: 
>>> * xIncluded resource not found: file:///
>>> 
>>> The xinclude seem to search a file and not a pipeline... 
>>> 
>>> Thank you for any ideas.
>>> 
>>> Notes : 
>>> -- this Xinclude is build in an xsl call during the module's sitemap
>>> 
>>> -- in the module's sitemap, I have one pipeline with this match, but
> it's
>>> don't call  : 
>>> <!-- patern = retrive/adress/web/without/http -->
>>>         <map:match pattern="retrive/**">
>>>                 <map:generate src="http://{1}" type="html"/>
>>>                 <map:serialize type="xml"/>
>>>         </map:match>
>>> 
>>> 
>>> 
>>> On Thu, 20 Nov 2008 11:30:05 +0100, Andreas Hartmann
> <andreas@apache.org>
>>> wrote:
>>>> Hi André,
>>>>
>>>> Florent André schrieb:
>>>>> I would like to parse localy downloaded (via <xi:include
> parse="text">)
>>>>> html pages.
>>>> I'm afraid this approach will only cause a lot of headache. I'd rather
>>>> recommend to use the HTMLGenerator [1] to parse the files. In your
>>>> XInclude statement you can just call the HTMLGenerator pipeline using
>>>> the cocoon:/ protocol.
>>>>
>>>> [1] http://cocoon.apache.org/2.1/userdocs/html-generator.html
>>>>
>>>> HTH,
>>>>
>>>> -- Andreas
>>>>
>>>>> After download, <xi:include> give me an "escape" html file.
>>>>>
>>>>> I suppress <!Doctype ... > with regex, but now the unescape
> transformer
>>>>> throw this error :
>>>>> " Caused by: org.xml.sax.SAXParseException: The entity "nbsp" was
>>>>> referenced, but not declared."
>>>>>
>>>>> I found this on internet : "To allow the use off &nbsp in you
>>>> stylesheet,
>>>>> you have to declare it first :  <!DOCTYPE xsl:stylesheet [<!ENTITY
> nbsp
>>>>> " ">]> "
>>>>>
>>>>> How I can add this declaration in the java unescape transformer ?
>>>>>
>>>>> I think that I can remove all &nbsp with a regex, but I would like
to
>>>> more
>>>>> understand how work java transformer.
>>>>>
>>>>> Thanks and have a good day.
>>>>>
>>>>> Florent
>>>>
>>>> --
>>>> Andreas Hartmann, CTO
>>>> BeCompany GmbH
>>>> http://www.becompany.ch
>>>> Tel.: +41 (0) 43 818 57 01
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
>>>> For additional commands, e-mail: user-help@lenya.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
> For additional commands, e-mail: user-help@lenya.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@lenya.apache.org
For additional commands, e-mail: user-help@lenya.apache.org


Mime
View raw message