lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhang, Lisheng" <Lisheng.Zh...@broadvision.com>
Subject RE: worddoucments search
Date Mon, 30 Aug 2004 22:31:37 GMT
Hi Otis,

I looked at textmining site, it seems to me textmining
is a wrapper on the top of POI, so the basic features
should be the same as POI, is this true?

I have tested POI with lucene, in general it works fine, 
but I found sometimes it cannot process some MSDOC files
created from old version. But if I just save the old
DOC file by new Word on XP, eveything is fine.

Thanks very much for helps, 

Lisheng

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: Tuesday, August 24, 2004 10:24 AM
To: Lucene Users List
Subject: Re: worddoucments search


As I just answered in a separate email to Ryan - we used textmining.org
library, too, as an example of something that is easier to use than
POI.  It's been a while since I wrote that chapter, so it slipped my
mind when I replied.  Yes, use textmining.org first, you'll be able to
include it in your code in 2 minutes.  Good stuff.

Otis



--- Ryan Ackley <sackley@cfl.rr.com> wrote:

> Otis,
> 
> Why didn't you use the textmining.org library? You even asked me to
> fix a
> bug for the book , which I did. Also, the code would have been about
> three
> lines.
> 
> -Ryan
> 
> ----- Original Message ----- 
> From: "Otis Gospodnetic" <otis_gospodnetic@yahoo.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Tuesday, August 24, 2004 7:41 AM
> Subject: Re: worddoucments search
> 
> 
> > For Lucene in Action Erik and I wrote a little extensible framework
> for
> > indexing various documents, including MS Word.  We used POI, so the
> > solution works on Winblows, UNIX/Linux, OSX....  I think the code
> is
> > bit too big for the list, but the book will be out soon.  Erik and
> I
> > are going through copy and tech editing right now.  POI:
> > http://jakarta.apache.org/poi .
> >
> > Otis
> >
> >
> > --- Don Vaillancourt <donv@webimpact.com> wrote:
> >
> > > I could ber wrong, but I don't think that there is an indexer for
> > > word
> > > documents.
> > >
> > > There's a Python version of Lucene called Lupy with a Python
> indexer
> > > for
> > > all sorts of document types
> (http://www.methods.co.nz/docindexer/).
> > > Would anyone be willing to port those over.  Although the MSWord
> > > indexer
> > > only words on MSWindows and you may need MSWord for it to work. 
> Man,
> > >
> > > that's no good.
> > >
> > > I think that we'd need to ask the OpenOffice people for help on
> this.
> > >
> > >
> > > Santosh wrote:
> > >
> > > >Can lucene be able to search word documents? if so please give
> me
> > > information about it
> > > >
> > > >regards
> > > >Santosh kumar
> > > >
> > > >
> > > >-----------------------SOFTPRO
> > > DISCLAIMER------------------------------
> > > >
> > > >Information contained in this E-MAIL and any attachments are
> > > >confidential being  proprietary to SOFTPRO SYSTEMS  is
> 'privileged'
> > > >and 'confidential'.
> > > >
> > > >If you are not an intended or authorised recipient of this
> E-MAIL or
> > > >have received it in error, You are notified that any use,
> copying or
> > > >dissemination  of the information contained in this E-MAIL in
> any
> > > >manner whatsoever is strictly prohibited. Please delete it
> > > immediately
> > > >and notify the sender by E-MAIL.
> > > >
> > > >In such a case reading, reproducing, printing or further
> > > dissemination
> > > >of this E-MAIL is strictly prohibited and may be unlawful.
> > > >
> > > >SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an
> attachment
> > > >hereto is free from computer viruses or other defects.
> > > >
> > > >The opinions expressed in this E-MAIL and any ATTACHEMENTS may
> be
> > > >those of the author and are not necessarily those of SOFTPRO
> > > SYSTEMS.
> > >
> >
>
>------------------------------------------------------------------------
> > > >
> > > >
> > > >
> > >
> > >
> > > -- 
> > > *Don Vaillancourt
> > > Director of Software Development
> > > *
> > > *WEB IMPACT INC.*
> > > phone: 416-815-2000 ext. 245
> > > fax: 416-815-2001
> > > email: donv@web-impact.com <mailto:donv@webimpact.com>
> > > web: http://www.web-impact.com
> > >
> > >
> > >
> > > / This email message is intended only for the addressee(s)
> > > and contains information that may be confidential and/or
> > > copyright. If you are not the intended recipient please
> > > notify the sender by reply email and immediately delete
> > > this email. Use, disclosure or reproduction of this email
> > > by anyone other than the intended recipient(s) is strictly
> > > prohibited. No representation is made that this email or
> > > any attachments are free of viruses. Virus scanning is
> > > recommended and is the responsibility of the recipient.
> > > /
> > > >
> >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> lucene-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message