poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Fisher <dfis...@jmlafferty.com>
Subject Re: a 'lite' version of ooxml-schemas jar
Date Mon, 16 Nov 2009 23:34:32 GMT
Hi Yegor,

I've been digging deeper into the dependencies in maven. I think that  
"lite" should become the usual way to build.

(1) maven/poi-ooxml.pom is missing two dependencies:

xmlbeans-2.3.0.jar
   <property name="ooxml.xmlbeans.jar" location="${ooxml.lib}/ 
xmlbeans-2.3.0.jar"/>
   <property name="ooxml.xmlbeans.url" value="${repository.m2}/maven2/ 
org/apache/xmlbeans/xmlbeans/2.3.0/xmlbeans-2.3.0.jar"/>

geronimo-stax-api_1.0_spec-1.0.jar
   <property name="ooxml.jsr173.jar" location="${ooxml.lib}/geronimo- 
stax-api_1.0_spec-1.0.jar"/>
   <property name="ooxml.jsr173.url" value="${repository.m2}/maven2/ 
org/apache/geronimo/specs/geronimo-stax-api_1.0_spec/1.0/geronimo-stax- 
api_1.0_spec-1.0.jar"/>

Is there a reason we let these out of the pom?

>> I propose to include ooxml-schemas-lite in the release cycle. The  
>> artifact name is ooxml-schemas-lite-${version.id}.jar.
>> Interested projects (first of all I mean Apache Tika) can setup  
>> their Maven poms to use <artifactId>poi-ooxml-lite</artifactId>   
>> instead of <artifactId>poi-ooxml</artifactId>. This will reduce the 

>> distribution size by approximately 10 MB.

(2) You propose a new artifact-id of ooxml-schemas-lite. I think a  
name like ooxml-poi, poi-ooxml-schemas, or poi-opc would be better.

There are a few points to make here:

- ooxml-schemas has a different versioning - it is version 1.0. It  
should not change much. We should have a documented build target for  
this.

- ooxml lite - should follow the poi versioning schema since newer  
versions of POI will cover more of the schema. So, it is not really  
quite a sub of ooxml-schema as much as it is a cross reference between  
ooxml-schema and poi-ooxml.

Which version should poi-ooxml use "lite" or ooxml-schemas? I think we  
should always use "lite" and distribute lite. We can put the "lite"  
classes in one of two places:

(a) In the poi-ooxml jar as part of that build.
(b) In its own jar under a new maven artifact-id. I like ooxml-poi

I think (b) is better, but if a user is working on ooxml support in  
poi-ooxml then they it is likely that they will be covering parts of  
the schema not yet covered by "lite"

Users will still want to work with the full schemas they need to make  
a choice when they build - either with a special target or by copying  
the big jar in ooxml-lib/

In general users will want to use the "lite" jar. We can provide  
access to the full ooxml-schema as a replacement. Is it possible to  
have "selective" targets in a maven pom? Can we make poi-ooxml  
dependent on either "ooxml-poi" or "ooxml-schema"?

For the build I think that an explicit target should be used called  
"ooxml" - this will perform your full task and make sure that the  
build environment is using "lite" and not "full". I suspect that this  
target may move some files around. We'll need to explain that adding  
support for parts of the schema means adding unit tests. These unit  
test should help us with documentation on the OOXML formats.

Regards,
Dave

On Nov 16, 2009, at 9:26 AM, David Fisher wrote:

> Hi Yegor,
>
> +1
>
> This will have affects on the website re-write.
>
> (1) The "How to Build" page has a list of common targets. Here is  
> what I have currently:
>
> clean -- Erase all build work products (ie. everything in the build  
> directory
> compile	-- Compiles all files from main, contrib and scratchpad
> test -- Run all unit tests from main, contrib and scratchpad (JUnit)
> jar -- Produce jar files
> docs -- Generate all documentation for the system (Apache Forrest)
> dist -- Create a distribution (JUnit and Apache Forrest)
>
> This should always be part of the dist target. Should we add a  
> target for building a "lite" ooxml, or is this always be part of jar  
> and test?
>
> I think we should have a "lite" target separate from jar and test.
>
> (2) I am reworking the home page. There is a table of components  
> that appear there.
>
> Document -- Component -- JAR -- Maven artifactId
> OLE2 Filesystem -- POIFS -- poi-version-yyyymmdd.jar -- poi
> OLE2 Property Sets -- HPSF -- poi-version-yyyymmdd.jar -- poi
> Excel XLS -- HSSF -- poi-version-yyyymmdd.jar -- poi
> Excel XLSX -- XSSF -- poi-ooxml-version-yyyymmdd.jar -- poi-ooxml
> PowerPoint PPT -- HSLF -- poi-scratchpad-version-yyyymmdd.jar -- poi- 
> scratchpad
> PowerPoint PPTX -- XSLF -- poi-ooxml-version-yyyymmdd.jar -- poi-ooxml
> Word DOC -- HWPF -- poi-scratchpad-version-yyyymmdd.jar -- poi- 
> scratchpad
> Word DOCX -- XWPF -- poi-ooxml-version-yyyymmdd.jar -- poi-ooxml
> Visio VSD -- HDGF -- poi-scratchpad-version-yyyymmdd.jar -- poi- 
> scratchpad
> Publisher PUB -- HPBF -- poi-scratchpad-version-yyyymmdd.jar -- poi- 
> scratchpad
> Outlook MSG -- HSMF -- poi-scratchpad-version-yyyymmdd.jar -- poi- 
> scratchpad
>
> I am missing the OOXML schemas in my list. With this new lite  
> version I need two rows.
>
> OOXML Schemas -- OpenXML4J -- ooxml-schemas-yyyymmdd.jar -- poi-ooxml
> OOXML Lite -- OpenXML4J -- ooxml-schemas-lite-yyyymmdd.jar -- poi- 
> ooxml-lite
>
> We will need to include poi-ooxml-version-yyyymmdd.jar in the poi- 
> ooxml-lite target as well. I'll mark the XLSX, XWPF, and XSLF rows  
> appropriately.
>
> Correct?
>
> (3) I 'll rewrite your description as a new page within the  
> currently very sparse. OOXML documentation.
>
> BTW - the www.openxml4j.org domain has gone away and I am going to  
> need help from you in deciding additional documentation and OPC  
> examples that we should include for the OOXML sub-project.
>
> Regards,
> Dave
>
> On Nov 16, 2009, at 8:53 AM, Yegor Kozlov wrote:
>
>> Hi All,
>>
>> As we discussed at Apachecon, one way to optimize the size of POI  
>> distributions is to create a 'lite' version of the ooxml-schemas jar.
>> The idea is simple: remove all unused classes and resources from  
>> the jar generated by XMLBeans. Rough estimations made at the  
>> Barcamp showed that POI uses less than 30% of the OOXML schemas,  
>> hence the optimized jar should be significantly smaller.
>>
>> With this in mind I created a simple utility called OOXMLLite, see http://svn.apache.org/repos/asf/poi/trunk/src/ooxml/java/org/apache/poi/util/OOXMLLite.java
>>
>> The process includes four simple steps:
>>
>> - run all ooxml unit tests
>> - see what classes from the ooxml-schemas.jar are loaded in the JVM
>> - copy the loaded classes into some directory.
>> - copy the binary resources (.xsb)
>>
>> A good acceptance test is to run the ooxml unit tests against the  
>> 'lite' classes - all should pass. There is an accompanying Ant task  
>> ooxml-xsds-lite for that, see build.xml.
>>
>> The resulting 'lite' jar is much smaller: ooxml-schemas-lite-3.6- 
>> beta1.jar is only 3.5 MB while the 'big' ooxml-schemas-1.0.jar is  
>> 14.5 MB. In theory, the size can be trimmed down below 3 MB  - my  
>> utility copies all .xsb files and does not yet track resource  
>> dependencies.
>>
>> I propose to include ooxml-schemas-lite in the release cycle. The  
>> artifact name is ooxml-schemas-lite-${version.id}.jar.
>> Interested projects (first of all I mean Apache Tika) can setup  
>> their Maven poms to use <artifactId>poi-ooxml-lite</artifactId>   
>> instead of <artifactId>poi-ooxml</artifactId>. This will reduce the 

>> distribution size by approximately 10 MB.
>>
>> Yegor
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
>> For additional commands, e-mail: dev-help@poi.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message