poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Polk, Scott W" <Scott.P...@Pearson.com>
Subject RE: [HWPF] Get Style Name for Paragraph/Character from Word Doc (.doc)
Date Fri, 12 Oct 2012 15:25:02 GMT
I retrieved the start and end offset of each paragraph, and the start
and end offset of each character run in each paragraph.  Here are the
results:

Paragraph: "Test Test2
"
  Start: 0
  End: 11
CharacterRun: "Test Test2"
  Start: 0
  End: 10
CharacterRun: "
"
  Start: 10
  End: 11

Paragraph: "Test3
"
  Start: 11
  End: 17
CharacterRun: "Test3
"
  Start: 11
  End: 17

How would I get the character style info rather than the paragraph style
info?  I see in Paragraph that you can use getStyleIndex to get the
style name from the StyleSheet, but there is nothing like this for
CharacterRun.

This is the code I am using.  Maybe I am doing something incorrectly?

POIFSFileSystem poifs = new POIFSFileSystem(new FileInputStream(path));
HWPFDocument wdDoc = new HWPFDocument(poifs);
		
// set range for entire document
Range range = wdDoc.getRange();

// loop through all paragraphs in range
for (int i = 0; i < range.numParagraphs(); i++) {
	Paragraph p = range.getParagraph(i);
	System.out.println("Paragraph: \"" + p.text() + "\"");
	System.out.println("  Start: " + p.getStartOffset());
	System.out.println("  End: " + p.getEndOffset());
	
	for (int j = 0; j < p.numCharacterRuns(); j++) {
		CharacterRun cr = p.getCharacterRun(j);
		System.out.println("CharacterRun: \"" + cr.text() +
"\"");
		System.out.println("  Start: " + cr.getStartOffset());
		System.out.println("  End: " + cr.getEndOffset());
	}
	
	// check if style index is greater than total number of styles
	if (wdDoc.getStyleSheet().numStyles() > p.getStyleIndex()) {
		System.out.println("Returned Style Index -> " +
p.getStyleIndex());
		StyleDescription style =
wdDoc.getStyleSheet().getStyleDescription(p.getStyleIndex());
		String styleName = style.getName();
		// write style name and associated text
		System.out.println(styleName + " -> " + p.text());
	} else {
		System.out.println("\n" +
wdDoc.getStyleSheet().numStyles() + " ----> " + p.getStyleIndex());
	}
}

Scott


-----Original Message-----
From: Nick Burch [mailto:nick@apache.org] 
Sent: Friday, October 12, 2012 4:47 AM
To: user@poi.apache.org
Subject: Re: [HWPF] Get Style Name for Paragraph/Character from Word Doc
(.doc)

On 11/10/12 19:30, Polk, Scott W wrote:
> The style of the first line is set to Quote, while the style of the
> second line is set to Strong.

Is that the style of the paragraph, or just of some text? IIRC, you can 
style either a paragraph or some text in it (possibly all of it!), and 
they end up differently in the file

It might be worth checking the start and end of the character runs 
within in the paragraphs, to check what's happening. You might find you 
need to get character style info rather than paragraph style info

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Mime
View raw message