poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MSB <markbrd...@tiscali.co.uk>
Subject RE: Modify word document
Date Tue, 24 Nov 2009 12:53:59 GMT

I am surprised at that because I was able to create a search/replace routine
that worked as long as the search term was longer than the replacement text.
That limitation aside, and as far as I am aware, it has not yet been
possible to create a fully working search/replace routine using HWPF as the
API is still very immature.

One problem you are likely to face when replacing the CharacterRun(s) is
that HWPF will not alow you to make more than one modification to the
formatting of the text in the 'new' CharacterRun - at least, it did not the
last time I tried it. I was able to set the font for example OR to set the
colour of the font but if I tried to do both HWPF threw an exception. If I
have the time later today when I am back at 'my' PC, I will try to look out
the test code I was putting together to demonstrate how to use the API to
create Word documents and run the tests again to see if I get the same
problems. The other problem you are likely to face is locating the
CharacterRun that has to be replaced and then actually substituting one run
for another. I have never tried to do this and imagine that they are
maintained within the Paragraph object as a list, but I do not know if it is
possible to get at the index number of the CharacterRun you intend to
replace so that you can place the new CharacterRun into the correct
location. Further, this may well corrupt at least one of the pointers that
the Document maintains. Word .doc files are composed of one or more streams
of data and each stream can be thought of as a linked list. The File
Information Block stores imortant information about the location of the
streams in the file - just as an example, it records where the document's
text starts and where it ends. Now, imagine what might happen if we change
the amount of text in the file and do not set that value stored in the FIB
correctly. This is what I fear could happen if there is no existing
mechanism allowing us to swap out CharaterRun(s) form Paragraph(s). Further,
what knock on effects could this have for the other linked lists? It may
well render all of the pointers used to establish those links inaccurate.

Of course, I could very well be wrong as I am typing this without access to
the javadoc and do not know if there is a method defined on the Paragraph
class to allow us to insert/delete a CharacterRun.

Yours

Mark B


Fabián Avilés Martínez wrote:
> 
> Hi, as I told you, I have tried it, but with the same result, the
> resulting file is corrupted, that is what MSWord says. My next approach is
> to create a copy file, and do modifications within this file. My problem
> is that I do not know how to save modifications done in the charRuns of
> the paragraphs, what I mean is to persist modifications done in the
> resulting file, without have to coopy it, calling
> document.write(outputStream)
> 
> My code is:
> 
> public File processFile(final InputStream is, final Map<String, String>
> replacementText) throws IOException {
>         Set<String> keys = replacementText.keySet();
>         try {
>             // Makes a copy of the file.
>             File res = copyfile(is);
>             InputStream auxIs = new FileInputStream(res);
>             POIFSFileSystem poifs = new POIFSFileSystem(auxIs);
>             HWPFDocument document = new HWPFDocument(poifs);
>             Range range = document.getRange();
> 
>             for (int i = 0; i < range.numParagraphs(); i++) {
>                 Paragraph paragraph = range.getParagraph(i);
>                 int numCharRuns = paragraph.numCharacterRuns();
>                 for (int j = 0; j < numCharRuns; j++) {
>                     CharacterRun charRun = paragraph.getCharacterRun(j);
>                     for (Iterator<String> it = keys.iterator();
> it.hasNext();) {
>                         String key = it.next();
>                         if (charRun.text().contains(key)) {
>                             String value = replacementText.get(key);
>                             charRun.replaceText(key, value);
>                             range = document.getRange();
>                             paragraph = range.getParagraph(i);
>                             charRun = paragraph.getCharacterRun(j);
>                         }
>                     }
>                 }
>             }
>             is.close();
>             return res;
>         } catch (IOException e) {
>             logger.error("Error procesando el fichero WORD: " + e);
>             throw new IOException("Error procesando el fichero WORD");
>         } finally {
>             if (is != null) {
>                 is.close();
>             }
>         }
>     }
> 
> 
> Thanks in advance, Fabi.
> 
> -----Mensaje original-----
> De: MSB [mailto:markbrdsly@tiscali.co.uk] 
> Enviado el: martes, 24 de noviembre de 2009 8:43
> Para: user@poi.apache.org
> Asunto: Re: Modify word document
> 
> 
> You have not dug down far enough into the structure of the document yet I
> am
> afraid - all of the formatting information is stopred (encapsulated)
> within
> the CharacterRun class and you need to perform the repllacements at that
> level.
> 
> I do not have any suitable code at hand as I type this so what follows
> will
> need to be converted into Java and tested;
> 
> Open the Word document.
> Get the overall Range for the document.
> Get the number of Paragraph objects the Range contains.
> Iterate through the Pargraphs and for each Pargraph
>     Get the CharacterRun(s) the Paragraph contains.
>     Call the method to replace the search term with the replacement text
> on
> the CharacterRun
> Save the modified document away again.
> 
> You do however face a couple of problems with this. It has been a long
> time
> since I tried to write a search and replace routine using HWPF and I could
> not get it to work if the replacement text was longer that the search
> term.
> In that case, HWPF threw an exception and would not allow me to complete
> the
> process; but that problem could well have been addressed by now as it was
> well known and caused by faulty bounds checking within the Range class.
> Only
> testing will prove or disprove this for you I am afraid.
> 
> Secondly, the CharacterRun class encapsulates a piece of text with common
> properties. So, imagine that we are searching for the phrase 'search term'
> and that the word 'search' has been emboldened whilst the word 'term' has
> been left as normal text, then my suggested approach will not work. That
> is
> because the words search and term will be held in different
> CharacterRun(s).
> If you do hit this problem, then I am afraid you will have to write code
> that searches for the term at the Paragraph level and that identifies
> where
> the search terms can be found and recovers the CharacterRun(s) that
> encapsulate them. Once you have these, you can modify the runs or create
> and
> substitute new ones but I have to admit that I have never tried to do this
> myself. Instead I chose to automate Word using OLE and to explore the
> possibilities offered by OpenOffices UNO interface. Both options did work
> but threw up other problems that proved more limiting (in terms of
> architecture and platform). If you can get it to work, HWPF offers the
> better solution IMO.
> 
> Yours
> 
> Mark B
> 
> 
> Fabián Avilés Martínez wrote:
>> 
>> Hi all,
>> 	I have a Word document, as a template: In this template there are some
>> tokenized words, which have to be modified and the result has to be saved
>> into another file. The original file has some properties, like header and
>> footer, images, etc. The resulting file has to be the same, but with the
>> modified words. I am trying it with the code below, but it does not work.
>> 
>> public ByteArrayOutputStream processFile(final InputStream is, final
>> Map<String, String> replacementText)
>>         throws IOException {
>>         Set<String> keys = replacementText.keySet();
>>         try {
>>             POIFSFileSystem poifs = new POIFSFileSystem(is);
>>             HWPFDocument document = new HWPFDocument(poifs);
>>             Range range = document.getRange();
>> 
>>             for (int i = 0; i < range.numParagraphs(); i++) {
>>                 String newTxt = range.getParagraph(i).text();
>>                 String oldTxt = range.getParagraph(i).text();
>>                 for (Iterator<String> it = keys.iterator();
>> it.hasNext();)
>> {
>>                     String key = it.next();
>>                     if (newTxt.contains(key)) {
>>                         newTxt = replacePlaceholders(key,
>> replacementText.get(key), newTxt);
>>                     }
>>                 }
>>                 if (!oldTxt.equals(newTxt)) {
>>                     range.getParagraph(i).replaceText(oldTxt, newTxt);
>>                 }
>>             }
>> 
>>             // Save the document away.
>>             ByteArrayOutputStream bos = new ByteArrayOutputStream();
>>             document.write(bos);
>>             bos.flush();
>>             bos.close();
>>             return bos;
>>         } catch (IOException e) {
>>             logger.error("Error procesando el fichero WORD: " + e);
>>             throw new IOException("Error procesando el fichero WORD");
>>         } finally {
>>             if (is != null) {
>>                 is.close();
>>             }
>>         }
>>     }
>> 
>> Any help, please?
>> 
>> Thanks in advance, Fabi.
>> 
>> 
>> 
>> ______________________
>> This message including any attachments may contain confidential 
>> information, according to our Information Security Management System,
>>  and intended solely for a specific individual to whom they are
>> addressed.
>>  Any unauthorised copy, disclosure or distribution of this message
>>  is strictly forbidden. If you have received this transmission in error,
>>  please notify the sender immediately and delete it.
>> 
>> ______________________
>> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>>  puede contener informacion clasificada por su emisor como confidencial
>>  en el marco de su Sistema de Gestion de Seguridad de la 
>> Informacion siendo para uso exclusivo del destinatario, quedando 
>> prohibida su divulgacion copia o distribucion a terceros sin la 
>> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje 
>>  erroneamente, se ruega lo notifique al remitente y proceda a su borrado. 
>> Gracias por su colaboracion.
>> 
>> ______________________
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>> 
>> 
>> 
> 
> -- 
> View this message in context:
> http://old.nabble.com/Modify-word-document-tp26480450p26491636.html
> Sent from the POI - User mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 
> ______________________
> This message including any attachments may contain confidential 
> information, according to our Information Security Management System,
>  and intended solely for a specific individual to whom they are addressed.
>  Any unauthorised copy, disclosure or distribution of this message
>  is strictly forbidden. If you have received this transmission in error,
>  please notify the sender immediately and delete it.
> 
> ______________________
> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>  puede contener informacion clasificada por su emisor como confidencial
>  en el marco de su Sistema de Gestion de Seguridad de la 
> Informacion siendo para uso exclusivo del destinatario, quedando 
> prohibida su divulgacion copia o distribucion a terceros sin la 
> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje 
>  erroneamente, se ruega lo notifique al remitente y proceda a su borrado. 
> Gracias por su colaboracion.
> 
> ______________________
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Modify-word-document-tp26480450p26495362.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Mime
View raw message