lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ankur Goel" <ank...@brickred.com>
Subject RE: Searching Microsoft Word , Excel and PPT files for Japanese
Date Fri, 21 May 2004 10:32:25 GMT
Thanks chandan .. 
I am tried  using POI for text extraction . I used The
WordDocument.writeAllText method but it didn't worked for Japanese. 
Is there any other way also for extracting the Japanese text?
Regards,
Ankur 

-----Original Message-----
From: Chandan Tamrakar [mailto:chandan@ccnep.com.np] 
Sent: Friday, May 21, 2004 3:51 PM
To: Lucene Users List; ankurg@brickred.com
Subject: Re: Searching Microsoft Word , Excel and PPT files for Japanese

    for miscrosoft word documents and excel use POI API's  from jakarta
apache.
   First you need to extract the test and convert inot suitable encoding
before you put into lucene for index.
   It worked for me.


----- Original Message -----
From: "Ankur Goel" <ankurg@brickred.com>
To: "'Lucene Users List'" <lucene-user@jakarta.apache.org>
Sent: Thursday, May 20, 2004 10:55 PM
Subject: Searching Microsoft Word , Excel and PPT files for Japanese


> Hi,
>
> I am using CJK Tokenzier for searching the Japanese documents.  I am able
to
> search japanese documents which are text files. But I am not able to
search
> from Microsoft word, excel files with content in Japanese.
>
> Can you tell me how can search on Japanese content for Microsoft word,
excel
> and ppt files.
>
> Thanks,
> Ankur
>
> -----Original Message-----
> From: Ankur Goel [mailto:ankurg@brickred.com]
> Sent: Sunday, April 04, 2004 1:36 AM
> To: 'Lucene Users List'
> Subject: RE: Boolean Phrase Query question
>
> Thanks Eric for the solution. I have to filename field as I have to give
the
> end user facility to search on File Name also. That's   why I am using
TEXT
> for file Name also.
>
> "By using true on the finalQuery.add calls, you have said that both fields
> must have the word "temp" in them.  Is that what you meant?  Or did you
mean
> an OR type of query?"
>
> I need an OR type of query. I mean the word can be in the filename or in
the
> contents of the filename. But i am not able to do this. Can you tell me
how
> to do it?
>
> Regards,
> Ankur
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Sunday, April 04, 2004 1:27 AM
> To: Lucene Users List
> Subject: Re: Boolean Phrase Query question
>
> On Apr 3, 2004, at 12:13 PM, Ankur Goel wrote:
> >
> > Hi,
> > I have to provide a functionality which provides search on both file
> > name and contents of the file.
> >
> > For indexing I use the following code:
> >
> >
> > org.apache.lucene.document.Document doc = new org.apache.
> > lucene.document.Document();
> > doc.add(Field.Keyword("fileId","" + document.getFileId()));
> > doc.add(Field.Text("fileName",fileName);
> > doc.add(Field.Text("contents", new FileReader(new File(fileName)));
>
> I'm not sure what you plan on doing with the fileName field, but you
> probably want to use a Keyword field for it.
>
> And you may want to glue the file name and contents together into a single
> field to facilitate searches to span both.  (be sure to put a space in
> between if you do this)
>
> > For searching a text say  "temp" I use the following code to look both
> > in file Name and contents of the file:
> >
> > BooleanQuery finalQuery = new BooleanQuery(); Query titleQuery =
> > QueryParser.parse("temp","fileName",analyzer);
> > Query mainQuery = QueryParser.parse("temp","contents",analyzer);
> >
> > finalQuery.add(titleQuery, true, false); finalQuery.add(mainQuery,
> > true, false);
> >
> > Hits hits = is.search(finalQuery);
>
> By using true on the finalQuery.add calls, you have said that both fields
> must have the word "temp" in them.  Is that what you meant?  Or did you
mean
> an OR type of query?
>
> Erik
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message