openoffice-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Howard Cary Morris <howard_cary_mor...@hotmail.com>
Subject RE: html code generated from Open Office
Date Sat, 13 Oct 2018 22:57:26 GMT
I created a new project in Source Forge, https://sourceforge.net/projects/open-office-html-4-to-html5/.
It contains my latest tested conversion. If you cannot read the files, I’ll add more files
to make it easier. Currently my latest in testing version is in http://www.americasfreedompressalliance.us/Howard/Open/
.

To see the source code there use http://www.americasfreedompressalliance.us/Howard/

and on bottom of page may enter

                Open/index.html

                Open/Gen2.php

                Open/ReadMe.txt

After looking at text, may use browser ‘Save Page as’ to get a copy of source.

I could put a shorter version of those programs in Source Forge project.



I have looked at code generated by both filters. The .html filter tries to add the page header
and page footer to the generated code. Does a bad job of it, I have cleaned it up some. The
.xhtml  filter doesn’t even try. The .xhtml code is true to the page width in Open Office
document. The .html filter doesn’t even try. One reason I am trying to make it as print
compatible as possible is to make an alternative to PDF. The output is much smaller. In fact,
if you compress the output and add the image files, the result is much smaller than the .odt
file. Also, some users may want the output to look like a (typed) document.



The ReadMe file tells more of what has been done than what needs to be done.

If I ever get to the table of contents, it will be really amazing.



I was able to get to the source of the .html filter. However with so many includes it was
impossible to wade through. As asked before, compiled versions with all the includes expanded
would help a lot. There seems to be 4 programs in the filter.



Howard





Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10



________________________________
From: Andrea Pescetti <pescetti@apache.org>
Sent: Saturday, October 13, 2018 3:24:52 PM
To: dev@openoffice.apache.org
Subject: Re: html code generated from Open Office

Howard Cary Morris wrote:
> I want the HTML5 look identical to printed code. I will have additional references to
understand the code.

I see many mixed ideas in this conversations. Let me give you some
pointers, and sorry for being late at this.

Start here:
https://archive.fosdem.org/2014/schedule/event/improving_the_xhtml_export_filter/
The slides you find there will give you all pointers (source code
modules, issues, patches, history) for the XHTML export filter and the
idea to repurpose it as an HTML5 export filter. The presentation is old
(and looks very old indeed!) but it's still accurate: we didn't change
that export in recent years.

As someone already told you, we have two filters, the HTML one and the
XHTML one. They are in different code modules.

The work has to be done in the source code, so whatever you have done in
PHP and HTML (?) will have to be rewritten. But I (and many others) will
be able to read your current work, assuming you are post-processing the
HTML or XHTML output, and we can give feedback if you make it available
somewhere.

There is a fundamental error in the idea of print fidelity: HTML, and
especially HTML5, are not designed with print fidelity in mind. I mean,
the idea to have the printed HTML5 identical to the OpenOffice (say) PDF
export is unfeasible since HTML rendering is done by the user-agent
(browser) and this is by design subject to what the browser decides to
do. If you constrain the browser too much by enforcing specific CSS, all
advantages of an HTML export will be gone. So the idea should be to have
a proper HTML5 export as a start, ignoring the printed output for the
time being. Priority should be on getting the semantic level (tags)
right, and some basic CSS transformations to get the styles right. Our
export is currently using bad HTML style, but the XHTML one is a bit
better than the HTML one.

For print fidelity (but this comes much later)
https://www.w3.org/TR/css3-page/ would be the place to start. It is
wonderful, but support from tools is still quite incomplete. And anyway
implementation will need the ground work above to be completed beforehand.

The way is long, but we are here to help, even though we are all
volunteers and are often less responsive than we would like to.

The first step is building OpenOffice on your system. There is no other
way, unfortunately. Does
https://wiki.openoffice.org/wiki/Documentation/Building_Guide_AOO make
any sense to you? If you are lost, we may be able to help if you
describe your system configuration. Linux is probably the easiest
platform for building.

Regards,
   Andrea.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message