lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shoba Ramachandran <shoba_duru...@yahoo.com>
Subject Re: AW: Using lucene with HSSF from Apache
Date Fri, 02 May 2003 15:21:58 GMT
Michael,

Thanks for your reply.
This is what I finally did. Now my concern is if the
file is huge (say 15MB) and has many sheets and cells,
would the performance be bad by doing like this.

Thanks again very much
Shoba


--- "Borkenhagen, Michael (ofd-ko zdfin)"
<Michael.Borkenhagen@ofd-ko.fin-rlp.de> wrote:
> You should have read the HSSF javadoc more
> thoroughly; I think that´s a
> Question for POI users, but I´d like to help you
> anywy.
> I´d extract the text form an Excel Sheet like this :
> 
> 
> public Reader getText(File f) throws IOException {
>  StringBuffer contentBuffer = new StringBuffer();
>     HSSFWorkbook wb = new HSSFWorkbook(new
> FileInputStream(f));
>     int numberOfSheets = wb.getNumberOfSheets();
>     for (int i = 0; i < numberOfSheets; i++) {
>       HSSFSheet sheet = wb.getSheetAt(i);
>       int numberOfRows = sheet.getLastRowNum();
>       for (int j = 0; j < numberOfRows; j++) {
>         HSSFRow row = sheet.getRow(j);
>         if (row != null) {      // empty lines :
> null :(
>           Iterator it = row.cellIterator();
>           while (it.hasNext()) {
>             HSSFCell cell = (HSSFCell)it.next();
>             int type = cell.getCellType();
>             if (type == HSSFCell.CELL_TYPE_STRING) {
>              
> contentBuffer.append(cell.getStringCellValue());
>              
> contentBuffer.append(Konstanten.BLANK);
>             }
>           }
>         }
>       }
>     }
>     String contentAsStr = contentBuffer.toString();
>     // create a tmp output stream with the size of
> the content.
>     ByteArrayOutputStream out = new
> ByteArrayOutputStream();
>     ivContents = contentAsStr.getBytes();
>     return new InputStreamReader(new
> ByteArrayInputStream(ivContents));
> }
> 
> Michael
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: Shoba Ramachandran
> [mailto:shoba_duruvan@yahoo.com]
> Gesendet: Mittwoch, 30. April 2003 18:10
> An: lucene-user@jakarta.apache.org
> Betreff: Using lucene with HSSF from Apache
> 
> 
> Hi,
> 
> Has anyone tried to index xls and doc files?
> I'm trying to do with HSSF from apache and using
> lucene1.2
> 
> This code returns me binary and printing it out
> gives
> junk chracters. File indexed like this returns
> nothing
> upon search. 
> 
> public static byte[] parse(File file) throws
> Exception
>   {
>     POIFSFileSystem fs = new POIFSFileSystem(new
> FileInputStream(file));
> HSSFWorkbook wb = new HSSFWorkbook(fs);
> byte[] xlsInfo = wb.getBytes();
>     System.out.println("xls content :  "+
> xlsInfo.toString());
> return xlsInfo;
>   }
> 
> Thanks in advance for your help
> Shoba
> 
> 
> __________________________________
> Do you Yahoo!?
> The New Yahoo! Search - Faster. Easier. Bingo.
> http://search.yahoo.com
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> 


__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message