poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Leimbach, Johannes" <JLeimb...@CONET.DE>
Subject AW: Extract Text From Excel
Date Mon, 31 Jul 2006 14:25:45 GMT
Hello,

last week I wrote a wrapper class to facilitate text extraction from Excel files, please see
the sourcecode below. 
Maybe this (or another example, don't care) should be posted on the POI homepage - I see very
few beginner's documentation there. 

Anyway, here's the class, should be self explanatory:

package fulltext.common.processing.helpers.poi;

import java.io.FileInputStream;
import java.io.IOException;
import java.util.Iterator;

import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;

/**
 * Wraps around the POI stuff to read an Excel (XLS) file from disk
 */
public class ExcelFileWrapper 
{
	private POIFSFileSystem _fileSystem;
	private HSSFWorkbook _workbook;
	
	/**
	 * Initialize the object - does not read yet
	 * @throws IOException 
	 */
	public ExcelFileWrapper(FileInputStream stream) throws IOException 
	{
		if (stream == null)
			throw new NullPointerException ("in ExcelFileWrapper: ctor parameter 'stream' is null.");
		//
        _fileSystem = new POIFSFileSystem(stream);
        _workbook = new HSSFWorkbook (_fileSystem);
	}
	
	/**
	 * Return the contents of all sheets as string.
	 * Every textual cell's content is added here.
	 */
	public String readContents ()
	{	
		// return this
		StringBuilder builder = new StringBuilder();
		
		// for each sheet
		for (int numSheets = 0; numSheets < _workbook.getNumberOfSheets(); numSheets++)
		{
	        HSSFSheet sheet = _workbook.getSheetAt(numSheets);
	        
	        // Iterate over each row in the sheet
	        Iterator rows = sheet.rowIterator();
	        while( rows.hasNext() ) 
	        {          
	            HSSFRow row = (HSSFRow) rows.next();
	
	            // Iterate over each cell in the row and add the cell's content
	            Iterator cells = row.cellIterator();
	            while( cells.hasNext() ) 
	            {
	            	// get cell..
	                HSSFCell cell = (HSSFCell) cells.next();
	                // .. add to stringbuilder
	                processCell (cell, builder);
	            }

	        }
	        
        } // for numSheets ..
		
		//
		return builder.toString();
	}

	/**
	 * Add the cells's content to the stringbuilder (if appropiate content, i.e. text - no numbers)
	 */
	private void processCell (HSSFCell cell, StringBuilder builder) 
	{
        switch ( cell.getCellType() ) 
        {
        /*
            case HSSFCell.CELL_TYPE_NUMERIC:
                System.out.println( cell.getNumericCellValue() );
                break;
        */
            case HSSFCell.CELL_TYPE_STRING:
                builder.append (cell.getStringCellValue());
                builder.append (" ");
                break;
                
            default:
                break;
        }
	}

}


- Johannes
 

-----Urspr√ľngliche Nachricht-----
Von: Michael J. Prichard [mailto:michael_prichard@mac.com] 
Gesendet: Montag, 31. Juli 2006 15:36
An: POI Users List
Betreff: Re: Extract Text From Excel

Hey Feris,

That [HSSF] is what I use as well and it works pretty good. 

-Michael

Suba Suresh wrote:

> You can use the hssf libraries for excel text extraction. I used it 
> for lucene indexing.
>
> suba suresh.
>
> Feris Thia wrote:
>
>> Hi All,
>>
>> I'm new to this user group. Is there any way to extract all the text 
>> from
>> Excel documents ? Want to perform indexing using POI + Lucene :)
>>
>> Thanks,
>>
>> Feris
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
> Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
>


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


---------------------------------------------------------------------
To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


Mime
View raw message