lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shoba Ramachandran <shoba_duru...@yahoo.com>
Subject RE: Using lucene with HSSF from Apache
Date Tue, 06 May 2003 14:43:53 GMT
Thanks Michel,


--- MMachado@LEVI.com wrote:
> Hi Shoba,
> About your question,YES I am able to index doc and
> excel files and then I
> can extract them with my search engine that give me
> a link to the file.
> Michel
> 
> -----Original Message-----
> From: Shoba Ramachandran
> [mailto:shoba_duruvan@yahoo.com] 
> Sent: Monday, May 05, 2003 4:38 PM
> To: Lucene Users List
> Subject: RE: Using lucene with HSSF from Apache
> 
> Thanks very much HTH,
> What would be the good number for max.no. of terms
> for
> a very huge document say 20MB PDF file.
> 
> Thanks
> Shoba
> 
> --- kelvin-lists@relevanz.com wrote:
> > It's doable, because if you open any ms office
> > document in your text
> > editor, you'll see that text is all there,
> > surrounded with binary
> > characters and other kinds of mumbo-jumbo. The
> > biggest minus is that
> > you'll exceed the max number of terms in a hurry,
> > even if you set it
> > to like a million, once you hit a reasonably large
> > file (>5MB).
> > 
> > What I do is filter out all the unreadable stuff
> > using some regex
> > filters. Only minus is that its somewhat slower
> coz
> > of the line by
> > line processing, but I'm sure its _much_ faster
> than
> > attempting to
> > add all those nonsense data.
> > 
> > HTH
> > 
> > On Fri, 2 May 2003 08:30:15 -0700 (PDT), Shoba
> > Ramachandran wrote:
> > >Hi Michel,
> > >
> > >Are you able to index and search xls and doc
> files
> > >with just Lucene using SimpleAnalyzer????
> > >There is no need for POI?
> > >With Lucene, you are able to extract the xls
> > content
> > >as text?
> > >
> > >Let me try as you explained.
> > >Thanks very much for your reply.
> > >Shoba
> > >
> > >--- MMachado@LEVI.com wrote:
> > >>Hi,
> > >>I did it, but I use only lucene. You need to
> > create
> > >>an IndexWriter with
> > >>SimpleAnalyzer, an InputStream as new
> > >>FileInputStream, create Document with
> > >>two Fields: one contains the file path and one
> > >>contains the file's content).
> > >>That's all.
> > >>Michel
> > >>
> > >>-----Original Message-----
> > >>From: Shoba Ramachandran
> > >>[mailto:shoba_duruvan@yahoo.com]
> > >>Sent: Wednesday, April 30, 2003 6:10 PM
> > >>To: lucene-user@jakarta.apache.org
> > >>Subject: Using lucene with HSSF from Apache
> > >>
> > >>Hi,
> > >>
> > >>Has anyone tried to index xls and doc files?
> > >>I'm trying to do with HSSF from apache and using
> > >>lucene1.2
> > >>
> > >>This code returns me binary and printing it out
> > >>gives
> > >>junk chracters. File indexed like this returns
> > >>nothing
> > >>upon search.
> > >>
> > >>public static byte[] parse(File file) throws
> > >>Exception
> > >>{
> > >>POIFSFileSystem fs = new POIFSFileSystem(new
> > >>FileInputStream(file));
> > >>HSSFWorkbook wb = new HSSFWorkbook(fs);
> > >>byte[] xlsInfo = wb.getBytes();
> > >>System.out.println("xls content :  "+
> > >>xlsInfo.toString());
> > >>return xlsInfo;
> > >>}
> > >>
> > >>Thanks in advance for your help
> > >>Shoba
> > >>
> > >>
> > >>__________________________________
> > >>Do you Yahoo!?
> > >>The New Yahoo! Search - Faster. Easier. Bingo.
> > >>http://search.yahoo.com
> > >>
> > >>
> >
>
>---------------------------------------------------------------------
> > 
> > >>To unsubscribe, e-mail:
> > >>lucene-user-unsubscribe@jakarta.apache.org
> > >>For additional commands, e-mail:
> > >>lucene-user-help@jakarta.apache.org
> > >>
> > >>
> >
>
>---------------------------------------------------------------------
> > 
> > >>To unsubscribe, e-mail:
> > >>lucene-user-unsubscribe@jakarta.apache.org
> > >>For additional commands, e-mail:
> > >>lucene-user-help@jakarta.apache.org
> > >>
> > >
> > >
> > >__________________________________
> > >Do you Yahoo!?
> > >The New Yahoo! Search - Faster. Easier. Bingo.
> > >http://search.yahoo.com
> > >
> >
>
>---------------------------------------------------------------------
> > 
> > >To unsubscribe, e-mail:
> > lucene-user-unsubscribe@jakarta.apache.org
> > >For additional commands, e-mail:
> > lucene-user-help@jakarta.apache.org
> > >
> > 
> > 
> > 
> > 
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> > lucene-user-help@jakarta.apache.org
> > 
> 
> 
> __________________________________
> Do you Yahoo!?
> The New Yahoo! Search - Faster. Easier. Bingo.
> http://search.yahoo.com
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> 


__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message