lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From joe_coder <>
Subject Personalized Search
Date Fri, 14 Aug 2009 08:11:23 GMT

Lets say we have 4 users : U1, U2, U3 and U4. Each user has a title and set
of documents created by him/her. Using this info, we can come up with a term
vector ( interest vector ) which would contain a set of top terms ( that
appeared in his/her docs ) along with frequency. So conceptually, we get
some thing like this: 
U1 => T1:1, T2:2, T3:1  ( Title: "something ")
U2 => T2:3, T4:1  ( Title: "Something else")
U3 => T1:5, T2:10, T7:1  ( Title: "Java Beans")
U4 => T1:7, T2:7, T8:1  ( Title: "Java World")

Now supposing user U1 searches for "Java", I need to search on all the
titles of all the users to find the best match ( i.e result should contain
U3 and U4 ) BUT also considering the interest of the target user w.r.t to
the user who is searching. I.e in the result U3 may get higher priority over
U4 as U3 is more closer to U1 ( as both U1 and U3 has similar interest based
on the term vectors... T1:T2 => 1:2 for U1, 1:2 for U3 and 1:1 for U4)

What would be the best way to implement this using lucene starting from
step1 by generating term vectors to using the term vector for ranking


View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message