lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From TJ Kolev <tjko...@gmail.com>
Subject Problem: Indexing and searching repeating groups of fields
Date Wed, 13 Jan 2010 20:59:52 GMT
Greetings,

Let's assume I have to index and search "resume" documents. Two fields are
defined: Language and Years. The fields are associated together in a group
called Experience. A resume document may have 0 or more Experience groups:

Ra{ E1(Java,5); E2(C,2); E3(PHP,3);}
Rb{ E1(Java,2); E2(C,5); E3(VB,1);}

How do I index such documents, and how do I search, so I can formulate a
query like this "Resumes which have (Java,5) and (C,2)" and get back Ra. I
know I can index multiple fields of the same name, and do "(Language:Java
AND Years:5) AND (Language:C AND Years:2)", but in addition to Ra that would
also return Rb, which I don't want. The problem here is that the "grouping"
is lost. I can create fields with compound names (E1Language, E1Years,
E2Language, E2Years, etc), but that helps me none, as I don't know which
group to search. I'd also like to query for "(Language:Java AND Years:5) OR
(Language:C AND Years:2)"

This is a simplified example. Real documents may have 30 - 40 groups, each
one with several fields. Putting all the fields in a group in one index
field won't work as the numeric/date ones should be available for range
searchers.

So far the way I see it is to do my own post processing on the results. The
issue is that text fields will need to be untokenized, or otherwise it would
be difficult to work on the result, and determine what matches.

Thank you.
tjk :)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message