lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Problem: Indexing and searching repeating groups of fields
Date Wed, 13 Jan 2010 21:52:16 GMT
Ooooh, isn't that easier. You just prompted me to think
that you don't even have to do that, just index the pairs as single
tokens (KeywordAnalyzer? but watch out for no case folding)...

On Wed, Jan 13, 2010 at 4:30 PM, Digy <> wrote:

> How about using languages as fieldnames?
> Doc1(Ra):
>        Java:5
>        C:2
>        PHP:3
> Doc2(Rb)
>        Java:2
>        C:5
>        VB:1
> Query:Java:5 AND C:2
> -----Original Message-----
> From: TJ Kolev []
> Sent: Wednesday, January 13, 2010 11:00 PM
> To:
> Subject: Problem: Indexing and searching repeating groups of fields
> Greetings,
> Let's assume I have to index and search "resume" documents. Two fields are
> defined: Language and Years. The fields are associated together in a group
> called Experience. A resume document may have 0 or more Experience groups:
> Ra{ E1(Java,5); E2(C,2); E3(PHP,3);}
> Rb{ E1(Java,2); E2(C,5); E3(VB,1);}
> How do I index such documents, and how do I search, so I can formulate a
> query like this "Resumes which have (Java,5) and (C,2)" and get back Ra. I
> know I can index multiple fields of the same name, and do "(Language:Java
> AND Years:5) AND (Language:C AND Years:2)", but in addition to Ra that
> would
> also return Rb, which I don't want. The problem here is that the "grouping"
> is lost. I can create fields with compound names (E1Language, E1Years,
> E2Language, E2Years, etc), but that helps me none, as I don't know which
> group to search. I'd also like to query for "(Language:Java AND Years:5) OR
> (Language:C AND Years:2)"
> This is a simplified example. Real documents may have 30 - 40 groups, each
> one with several fields. Putting all the fields in a group in one index
> field won't work as the numeric/date ones should be available for range
> searchers.
> So far the way I see it is to do my own post processing on the results. The
> issue is that text fields will need to be untokenized, or otherwise it
> would
> be difficult to work on the result, and determine what matches.
> Thank you.
> tjk :)
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message