poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zachary Mitchell" <zac....@internode.on.net>
Subject Re: Find start and finish point in HWPFDocument bytes.
Date Sat, 25 Sep 2010 05:00:17 GMT

----- Original Message ----- 
From: "Nick Burch" <nick.burch@alfresco.com>
To: "POI Users List" <user@poi.apache.org>; "Zachary Mitchell" 
<zac.j.m@internode.on.net>
Sent: Friday, September 24, 2010 8:42 PM
Subject: Re: Find start and finish point in HWPFDocument bytes.


> On Fri, 24 Sep 2010, Zachary Mitchell wrote:
>> I wish to create the document, and
>> based on a Picture file, as an array of type
>> primitive byte [],
>> insert these bytes, in the write way,
>> into the document byte [] bytes
>
> That won't work - Word doesn't store the raw picture data at the offset. 
> Instead, at that offset you'll find a series of header, and if you're 
> lucky, the picture data somewhere after that...
>
> See my earlier reply for more information on what you'd need to do if you 
> wanted to add pictures
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>

I'm still confused. With my "array search" algorithm, I try to

search my word document byte [] for occurences of my picture byte [].

I am also trying to reverse engineer from Picture, HWPFDocument....

I suspect that I am confusing myself.

Should I compare byte [] from
DataInputStream of picture file, word file or
byte [] from hwpfdocument, Picture?

Any help at all (have had a look at poi source code).

?

//------------------------------------------- 
import java.io.*;
import java.nio.*;
import java.util.*;
import java.util.concurrent.*;
import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.poifs.storage.*;
//-------------------------------------------
import org.apache.poi.hwpf.*;
//import org.apache.poi.hwpf.model.*;
//import org.apache.poi.hwpf.model.io.*;
import org.apache.poi.hwpf.usermodel.*;
//------------------------------------------- 
//import org.apache.poi.hssf.usermodel .*;
//------------------------------------------- 
import java.lang.reflect.*;
//------------------------------------------- 
//import javax.management.openmbean.*;
//------------------------------------------- 
import javax.imageio.stream.*;
//------------------------------------------- 
public class MSImageEmbedAttempt {


public static void main (String [] args)

{
try{
////////////////////////////////////////////////////////////////////////////
FileInputStream input = new FileInputStream(new File("demo.doc"));
POIFSFileSystem fileSystem = HWPFDocument.verifyAndBuildPOIFS(input);
HWPFDocument document = new HWPFDocument(fileSystem);
input.close();
Field dataStream = document.getClass().getDeclaredField("_mainStream");
dataStream.setAccessible(true);
byte [] fileArray = (byte [])dataStream.get(document);
////////////////////////////////////////////////////////////////////////////

//7828 bytes read. how big does File API say it is?
File flanders = new File("flanders.gif");
System.out.println("flanders.gif: " + flanders.length());
System.out.println("---------------------------------------------------------------------");
//Indeed, works for demonstration single file.
//How to write image file bytes out to file?

DataInputStream inputTwo = new DataInputStream(new 
FileInputStream("flanders.gif"));
ConcurrentLinkedQueue<Byte> queue = new ConcurrentLinkedQueue<Byte>();
Byte datum = null;
while(inputTwo.available() > 0 )
{
datum = new Byte(inputTwo.readByte());
if(datum instanceof Byte)
{ queue.add(datum);}
}
inputTwo.close();


byte [] pictureArray = new byte[queue.size()];

for(int i=0;i<pictureArray.length;i++)
{
pictureArray[i] = queue.poll().byteValue();

}

Picture picture = new Picture(pictureArray);
pictureArray = picture.getContent();


//??????????????????????????????????????????????????????????????????????????????????????????
//picture << file => one is an aggregate of the other.
//THIS SECTION NEEDS DEBUGGING AND FURTHER WORK for multiple images in word 
file.

byte [] resultArray = new byte[pictureArray.length];
boolean first = false;
boolean last = false;
int a = 0;
int b = 0;
int k = 0;

for (int i=0; i<fileArray.length; i++)
{
for (int j=0; j<pictureArray.length; j++)
{

if (fileArray[i] == pictureArray[j])
{
first = true;

resultArray[k] = fileArray[i];
k++;
a = i;
}
else
{
if(first == true)
{
last = true;
first = false;
b = i;

if(k != pictureArray.length)
{
Arrays.fill(resultArray,(new Integer(0)).byteValue());
}
break;
}
}
}

if (last == true)
{
last = false;
break;

}

}
//??????????????????????????????????????????????????????????????????????????????????????????
//What about when the picture ends, with more file?



System.out.println("Search completed.");
System.out.println("a: " + a);
System.out.println("b: " + b);
System.out.println("Picture array, read by binary from GIF file:");
System.out.println(Arrays.toString(pictureArray)); //A
System.out.println("Word File array, read from HWPFDocument Word document 
file.");
System.out.println(Arrays.toString(fileArray)); //B
System.out.println("Image result array, extracted by binary from Word 
document:");
System.out.println(Arrays.toString(resultArray)); //C
System.out.println("---------------------------------------------------------------------");
System.out.println("Number of bytes: " + pictureArray.length);





//because of this, one knows that file data is being reinterpreted.
for (int i=0;i<fileArray.length;i++)
{
if(fileArray[i] == -119)
{
System.out.println("Found start.");
//if((i<fileArray.length) && (fileArray[i+1] == 80))
//{System.out.println("Found start.");}
}


}


FileImageOutputStream output = new FileImageOutputStream(new File 
("destination.gif"));
output.write(pictureArray,0,pictureArray.length);
output.close();


}

catch (Exception e)
{e.printStackTrace();}
}
}





---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Mime
View raw message