incubator-odf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Weir (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ODFTOOLKIT-308) GSoC: ODF Command Line Tools
Date Wed, 07 Mar 2012 13:02:58 GMT

    [ https://issues.apache.org/jira/browse/ODFTOOLKIT-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224264#comment-13224264
] 

Rob Weir commented on ODFTOOLKIT-308:
-------------------------------------

Good thoughts.   The other part is the glue between the command line tools.  That was always
the real power of the Unix tools, that they could easily be combined.  For example, I recently
did this to search for all openoffice.org email address on downloaded copy of the openoffice
website, deduping and sorting by how many times each address appeared:


grep -o -r -i --no-filename --include="*.html" "[[:alnum:]+\.\_\-]*@openoffice.org" . | sort
| uniq -c | sort -n -r

So, powerful command line tools that each do one thing well.  And then a way to pipe the outputs
of one to become the inputs of another.  The trick will be that an ODF document is a ZIP file
containing multiple XML files, and possibly other resources, like JPG images. If we pipe the
binary ZIP, then we're forcing each tool in the chain to do the uncompress/compress, which
is bad for performance.  There is also the issue of repeated parsing/serialization of the
XML.   So perhaps we don't use the OS's command line but create our own command line processor,
entirely in a single JVM instance.  Or there might be other clever ways of making this efficient.
                
> GSoC:  ODF Command Line Tools
> -----------------------------
>
>                 Key: ODFTOOLKIT-308
>                 URL: https://issues.apache.org/jira/browse/ODFTOOLKIT-308
>             Project: ODF Toolkit
>          Issue Type: New Feature
>            Reporter: Rob Weir
>            Assignee: Rob Weir
>              Labels: gsoc2012, mentor
>
> GNU/Linux, and UNIX before then has shown the great power of a text processing via simple
command line tools, combined with operating facilities for piping and redirection. This filter-baed
text processing is what makes shell programming so powerful.  But it only works well for text
documents.  But what about more complex, WYSIWYG documents, spreadsheets, word processors,
with more complex formats, often not text based at all?  The tool set becomes far weaker.
> The Apache ODF Toolkit is a Java API that gives a high level view of a document, and
enables programmatic manipulation of a document.  We have functions for doing things like
search & replace.  There is a lot you can do using the ODF Toolkit.  But it still requires
Java programming, and that limits its reach to professional programmers.
> What if we could write, using the ODF Toolkit, a set of command line utilities that made
it easy to do both simple and complex text manipulation tasks form a command line, things
like:
> 1) Concatenate documents
> 2) Replace slide 3 in presentation A with slide 3 from presentation B
> 3) Apply the styles of document A to all documents in the current directory
> 4) Find all occurances of "sausages" in the given document and add a hyperlink to sausages.com
> and so on.
> Clearly analogs of cat, grep, diff and sed are obvious ones. Maybe something awk-like
that works with spreadsheets?  No need to be slavish to the original tools, but create something
of similar power, but which operate on ODF documents.  For example, an alternative solution
might be to write a new shell processor that has native commands for ODF document manipulation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message