lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Bruch <br...@cs.tu-darmstadt.de>
Subject Using Lucene with a rather simplistic scoring system?
Date Fri, 11 Jun 2010 13:35:07 GMT
Hi!

We are working on an experimental code-search engine that helps users to 
find example code snippets based on what a developer already typed 
inside her editor. Our "homemade search engine" produces some cool 
results but its performance is somehow limited :-) Thus, we are 
evaluating whether Lucene can solve our performance issues. However, we 
are not familiar with Lucene and I wonder if some of you could help me 
to learn whether Lucene fits our problem well. Thanks in advance for 
your comments.

The situation is as follows. For each source code file we extract some 
code properties like which types are used inside the code, which methods 
are overridden or which methods are called inside a method body etc. For 
each source code file we get a JSON structure similar to this:
{
     "class" : my.ExampleClass
     "extends" : the.SuperClass
     "overrides" :
         - the.SuperClass.method1()
         - the.SuperClass.method2()
     "used types":
         - a.Type1
         - a.Type2
         -   ...
     "used methods":
         - a.Type1.method32()
         - a.Type1.method23()
         - ...
<few more things>
}
The scoring function we use is rather simplistic. Given a query (which 
looks somehow identical to the document above) we determine for each 
feature (i.e. "used methods", "used types", "overrides" etc.) a simple 
matching strategy: the percentage of overlap between each query-document 
feature and db-document feature. Then we simply multiply each 
feature-score f_i with an individual feature-weight w_i and sum it all 
up into one overall score.

My questions are: Is it meaningful to use Lucene here in this setup- or 
put different - can I implement that scoring scheme with Lucene easily?  
How would such a solution look like? By just subclassing Scorer?

Many thanks in advance for advice

All the best,
Marcel


Mime
View raw message