lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ankur Goel" <ank...@brickred.com>
Subject Searching Microsoft Word , Excel and PPT files for Japanese
Date Thu, 20 May 2004 17:10:01 GMT
Hi,

I am using CJK Tokenzier for searching the Japanese documents.  I am able to
search japanese documents which are text files. But I am not able to search
from Microsoft word, excel files with content in Japanese. 

Can you tell me how can search on Japanese content for Microsoft word, excel
and ppt files.

Thanks,
Ankur  

-----Original Message-----
From: Ankur Goel [mailto:ankurg@brickred.com] 
Sent: Sunday, April 04, 2004 1:36 AM
To: 'Lucene Users List'
Subject: RE: Boolean Phrase Query question

Thanks Eric for the solution. I have to filename field as I have to give the
end user facility to search on File Name also. That's   why I am using TEXT
for file Name also.

"By using true on the finalQuery.add calls, you have said that both fields
must have the word "temp" in them.  Is that what you meant?  Or did you mean
an OR type of query?"

I need an OR type of query. I mean the word can be in the filename or in the
contents of the filename. But i am not able to do this. Can you tell me how
to do it?

Regards,
Ankur 

-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Sunday, April 04, 2004 1:27 AM
To: Lucene Users List
Subject: Re: Boolean Phrase Query question

On Apr 3, 2004, at 12:13 PM, Ankur Goel wrote:
>
> Hi,
> I have to provide a functionality which provides search on both file 
> name and contents of the file.
>
> For indexing I use the following code:
>
>
> org.apache.lucene.document.Document doc = new org.apache.
> lucene.document.Document();
> doc.add(Field.Keyword("fileId","" + document.getFileId())); 
> doc.add(Field.Text("fileName",fileName);
> doc.add(Field.Text("contents", new FileReader(new File(fileName)));

I'm not sure what you plan on doing with the fileName field, but you
probably want to use a Keyword field for it.

And you may want to glue the file name and contents together into a single
field to facilitate searches to span both.  (be sure to put a space in
between if you do this)

> For searching a text say  "temp" I use the following code to look both 
> in file Name and contents of the file:
>
> BooleanQuery finalQuery = new BooleanQuery(); Query titleQuery = 
> QueryParser.parse("temp","fileName",analyzer);
> Query mainQuery = QueryParser.parse("temp","contents",analyzer);
>
> finalQuery.add(titleQuery, true, false); finalQuery.add(mainQuery, 
> true, false);
>
> Hits hits = is.search(finalQuery);

By using true on the finalQuery.add calls, you have said that both fields
must have the word "temp" in them.  Is that what you meant?  Or did you mean
an OR type of query?

	Erik



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message