Return-Path: Delivered-To: apmail-jakarta-poi-user-archive@www.apache.org Received: (qmail 93220 invoked from network); 1 Aug 2006 06:27:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 1 Aug 2006 06:27:07 -0000 Received: (qmail 18604 invoked by uid 500); 1 Aug 2006 06:27:05 -0000 Delivered-To: apmail-jakarta-poi-user-archive@jakarta.apache.org Received: (qmail 18585 invoked by uid 500); 1 Aug 2006 06:27:05 -0000 Mailing-List: contact poi-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Help: List-Post: List-Id: "POI Users List" Reply-To: "POI Users List" Delivered-To: mailing list poi-user@jakarta.apache.org Received: (qmail 18574 invoked by uid 99); 1 Aug 2006 06:27:05 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Jul 2006 23:27:05 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [217.5.249.140] (HELO mailix.conet.de) (217.5.249.140) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 31 Jul 2006 23:27:04 -0700 Received: (qmail 3424 invoked from network); 1 Aug 2006 06:26:40 -0000 Received: from sheex300.corp.conet.local (172.23.23.227) by mailix.conet.de with SMTP; 1 Aug 2006 06:26:40 -0000 Received: from sheex366.corp.conet.local ([172.23.23.48]) by sheex300.corp.conet.local with Microsoft SMTPSVC(6.0.3790.1830); Tue, 1 Aug 2006 08:26:36 +0200 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.5 Subject: AW: Extract Text From Excel Date: Tue, 1 Aug 2006 08:26:45 +0200 Message-ID: <8963E18186202146AD4AF866A1B0CAA4826967@sheex366.corp.conet.local> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Extract Text From Excel Thread-Index: Aca0uos6mOFL6TS1TUGw+hSLHBE2lAAeKiOw From: "Leimbach, Johannes" To: "POI Users List" X-OriginalArrivalTime: 01 Aug 2006 06:26:36.0439 (UTC) FILETIME=[70A3F270:01C6B533] X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hello, first of all: I do have a first name. It's "Johannes", and I prefer = being called "Johannes" over "Leimbach". Thanks ;) To your problem:=20 Can you provide more information about it? What's in the cells and where = do these errors come from? As far as I know HSSF is not able to read formulas or macros from Excel. Bye, Johannes -----Urspr=FCngliche Nachricht----- Von: Feris Thia [mailto:feris.apache@gmail.com]=20 Gesendet: Montag, 31. Juli 2006 18:01 An: POI Users List Betreff: Re: Extract Text From Excel Hello Suba, Michael and Leimbach, Thanks for the responses... it greatly helps me. Especially to Leimbach, = I have used your wrapper and tested it with my application. It works great = :) But I have some warnings (attach below) ... . Is it a limitation of HSSF = not to be able to read some Excel format ? [java] [WARNING] Unknown Ptg 14 (20) at cell (5,2) [java] [WARNING] Unknown Ptg 14 (20) at cell (6,2) [java] [WARNING] Unknown Ptg 14 (20) at cell (16,2) [java] [WARNING] Unknown Ptg 14 (20) at cell (5,1) [java] [WARNING] Unknown Ptg 14 (20) at cell (6,1) [java] [WARNING] Unknown Ptg 14 (20) at cell (6,4) [java] [WARNING] Unknown Ptg 14 (20) at cell (24,1) [java] [WARNING] Unknown Ptg 14 (20) at cell (24,4) [java] [WARNING] Unknown Ptg 14 (20) at cell (25,1) [java] [WARNING] Unknown Ptg 14 (20) at cell (25,4) [java] [WARNING] Unknown Ptg 14 (20) at cell (26,1) [java] [WARNING] Unknown Ptg 14 (20) at cell (26,4) [java] [WARNING] Unknown Ptg 14 (20) at cell (27,1) [java] [WARNING] Unknown Ptg 14 (20) at cell (27,4) And one more thins... so HSSF do not read the value of formulas ? Regards, Feris On 7/31/06, Leimbach, Johannes wrote: > > Hello, > > last week I wrote a wrapper class to facilitate text extraction from = Excel > files, please see the sourcecode below. > Maybe this (or another example, don't care) should be posted on the = POI > homepage - I see very few beginner's documentation there. > > Anyway, here's the class, should be self explanatory: > > package fulltext.common.processing.helpers.poi; > > import java.io.FileInputStream; > import java.io.IOException; > import java.util.Iterator; > > import org.apache.poi.hssf.usermodel.HSSFCell; > import org.apache.poi.hssf.usermodel.HSSFRow; > import org.apache.poi.hssf.usermodel.HSSFSheet; > import org.apache.poi.hssf.usermodel.HSSFWorkbook; > import org.apache.poi.poifs.filesystem.POIFSFileSystem; > > /** > * Wraps around the POI stuff to read an Excel (XLS) file from disk > */ > public class ExcelFileWrapper > { > private POIFSFileSystem _fileSystem; > private HSSFWorkbook _workbook; > > /** > * Initialize the object - does not read yet > * @throws IOException > */ > public ExcelFileWrapper(FileInputStream stream) throws = IOException > { > if (stream =3D=3D null) > throw new NullPointerException ("in > ExcelFileWrapper: ctor parameter 'stream' is null."); > // > _fileSystem =3D new POIFSFileSystem(stream); > _workbook =3D new HSSFWorkbook (_fileSystem); > } > > /** > * Return the contents of all sheets as string. > * Every textual cell's content is added here. > */ > public String readContents () > { > // return this > StringBuilder builder =3D new StringBuilder(); > > // for each sheet > for (int numSheets =3D 0; numSheets < > _workbook.getNumberOfSheets(); numSheets++) > { > HSSFSheet sheet =3D _workbook.getSheetAt(numSheets); > > // Iterate over each row in the sheet > Iterator rows =3D sheet.rowIterator(); > while( rows.hasNext() ) > { > HSSFRow row =3D (HSSFRow) rows.next(); > > // Iterate over each cell in the row and add the > cell's content > Iterator cells =3D row.cellIterator(); > while( cells.hasNext() ) > { > // get cell.. > HSSFCell cell =3D (HSSFCell) cells.next(); > // .. add to stringbuilder > processCell (cell, builder); > } > > } > > } // for numSheets .. > > // > return builder.toString(); > } > > /** > * Add the cells's content to the stringbuilder (if appropiate > content, i.e. text - no numbers) > */ > private void processCell (HSSFCell cell, StringBuilder = builder) > { > switch ( cell.getCellType() ) > { > /* > case HSSFCell.CELL_TYPE_NUMERIC: > System.out.println( cell.getNumericCellValue() ); > break; > */ > case HSSFCell.CELL_TYPE_STRING: > builder.append (cell.getStringCellValue()); > builder.append (" "); > break; > > default: > break; > } > } > > } > > > - Johannes > > > -----Urspr=FCngliche Nachricht----- > Von: Michael J. Prichard [mailto:michael_prichard@mac.com] > Gesendet: Montag, 31. Juli 2006 15:36 > An: POI Users List > Betreff: Re: Extract Text From Excel > > Hey Feris, > > That [HSSF] is what I use as well and it works pretty good. > > -Michael > > Suba Suresh wrote: > > > You can use the hssf libraries for excel text extraction. I used it > > for lucene indexing. > > > > suba suresh. > > > > Feris Thia wrote: > > > >> Hi All, > >> > >> I'm new to this user group. Is there any way to extract all the = text > >> from > >> Excel documents ? Want to perform indexing using POI + Lucene :) > >> > >> Thanks, > >> > >> Feris > >> > > > > > > = --------------------------------------------------------------------- > > To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org > > Mailing List: http://jakarta.apache.org/site/mail2.html#poi > > The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/ > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org > Mailing List: http://jakarta.apache.org/site/mail2.html#poi > The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/ > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org > Mailing List: http://jakarta.apache.org/site/mail2.html#poi > The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/ > > --------------------------------------------------------------------- To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org Mailing List: http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/