lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Goel, Nikhil" <nikhil.g...@verizon.com>
Subject Lucene Search Capabilities.
Date Thu, 12 May 2005 14:24:31 GMT
Hi, 

I have two questions regarding the search capability of Lucene. 

1) Lucene does the inverted indexing by which we mean it keeps how many
times a particular token is used. Is there a way to find out the list of
most frequently used words in the descending order. 

For example:- Suppose I have two docs in my index. One doc has "Lucence"
6 times in it(and thats the maximum out of all). Second doc has "Lucene"
once and "index" 6 times. 

So that means most frequently used word is "lucence" - used 7 times and
"index" is used 6 times. 

Is there a way to find out this information? 

2) I have a number of documents with BTN(10 digit numeric charater) in
their content. I want to do the following things:-
a) What query can I write to find the documents that have BTN included
in it. I think wildcard search will help but I am not able to find the
exact query. 
b) More importantly, will it tell us what exact BTN is there in the
document? For example lets say I search with java* and say 2 documents
matched. One of the document has "javaspace" in it and second has
"javaworld" in it. 
Is it possible to get these matched phrases through some API?


Thanks a lot.
Nikhil


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message