lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Natarajan.T" <nataraj...@crimsonlogic.co.in>
Subject RE: worddoucments search
Date Tue, 24 Aug 2004 13:11:08 GMT
Hi Santhosh,

Try out the below attached code.....(POI.jar should be in your class
path)


public String getContent(InputStream reader) throws IOException {
		    ArrayList text = new ArrayList();
		    POIFSFileSystem fsys = new POIFSFileSystem(reader);

		    DocumentEntry headerProps =
(DocumentEntry)fsys.getRoot().getEntry("WordDocument");
		    DocumentInputStream din =
fsys.createDocumentInputStream("WordDocument");
		    byte[] header = new byte[headerProps.getSize()];

		    din.read(header);
		    din.close();

		    //Get the information we need from the header
		    int info = LittleEndian.getShort(header, 0xa);
		    boolean useTable1 = (info & 0x200) != 0;

		    //get the location of the piece table
		    int complexOffset = LittleEndian.getInt(header,
0x1a2);

		    String tableName = null;
		    if (useTable1) {
		      tableName = "1Table";
		    }
		    else{
		      tableName = "0Table";
		    }

		    DocumentEntry table =
(DocumentEntry)fsys.getRoot().getEntry(tableName);
		    byte[] tableStream = new byte[table.getSize()];
		    din = fsys.createDocumentInputStream(tableName);
		    din.read(tableStream);
			din.close();

		    din = null;
		    fsys = null;
		    table = null;
		    headerProps = null;

		    int multiple = findText(tableStream, complexOffset,
text);

		    StringBuffer sb = new StringBuffer();
		    int size = text.size();
		    tableStream = null;

			WordTextPiece nextPiece = null;
			int start ;
			int length;
			String toStr = "";
			for (int x = 0; x < size; x++) {
				nextPiece = (WordTextPiece)text.get(x);
				start = nextPiece.getStart();
				length = nextPiece.getLength();

				boolean unicode =
nextPiece.usesUnicode();
				if (unicode) {
					toStr = new String(header,
start, length * multiple, "UTF-16LE"); 
				}
				else{ 
					toStr = new String(header,
start, length , "ISO-8859-1"); 
				} 
				
			}

			reader.close();
			return toStr;
		}


Regards,
Natarajan.



-----Original Message-----
From: Santosh [mailto:santosh.s@softprosys.com] 
Sent: Tuesday, August 24, 2004 5:46 PM
To: Lucene Users List
Subject: worddoucments search

Can lucene be able to search word documents? if so please give me
information about it

regards
Santosh kumar


-----------------------SOFTPRO DISCLAIMER------------------------------

Information contained in this E-MAIL and any attachments are
confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination  of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects. 

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.
------------------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message