lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From TJ Kolev <tjko...@gmail.com>
Subject Re: Problem: Indexing and searching repeating groups of fields
Date Fri, 15 Jan 2010 16:19:04 GMT
Hi!

I don't think the easy solution will work for me, because I'll have more
than two fields in a group - perhaps 6 - 10.

However using span queries looks very promising. I'll investigate that.

I see setPositionIncrement() only on the Token object. Is there a way to set
this when adding a field to the document, so that the first token get its
position pushed away. I would prefer not to modify my analyzer if possible.

Thank you.
tjk :)

On Wed, Jan 13, 2010 at 3:52 PM, Erick Erickson <erickerickson@gmail.com>wrote:

> Ooooh, isn't that easier. You just prompted me to think
> that you don't even have to do that, just index the pairs as single
> tokens (KeywordAnalyzer? but watch out for no case folding)...
>
> On Wed, Jan 13, 2010 at 4:30 PM, Digy <digydigy@gmail.com> wrote:
>
> > How about using languages as fieldnames?
> > Doc1(Ra):
> >        Java:5
> >        C:2
> >        PHP:3
> >
> > Doc2(Rb)
> >        Java:2
> >        C:5
> >        VB:1
> >
> > Query:Java:5 AND C:2
> >
> > DIGY
> >
> > -----Original Message-----
> > From: TJ Kolev [mailto:tjkolev@gmail.com]
> > Sent: Wednesday, January 13, 2010 11:00 PM
> > To: java-user@lucene.apache.org
> > Subject: Problem: Indexing and searching repeating groups of fields
> >
> > Greetings,
> >
> > Let's assume I have to index and search "resume" documents. Two fields
> are
> > defined: Language and Years. The fields are associated together in a
> group
> > called Experience. A resume document may have 0 or more Experience
> groups:
> >
> > Ra{ E1(Java,5); E2(C,2); E3(PHP,3);}
> > Rb{ E1(Java,2); E2(C,5); E3(VB,1);}
> >
> > How do I index such documents, and how do I search, so I can formulate a
> > query like this "Resumes which have (Java,5) and (C,2)" and get back Ra.
> I
> > know I can index multiple fields of the same name, and do "(Language:Java
> > AND Years:5) AND (Language:C AND Years:2)", but in addition to Ra that
> > would
> > also return Rb, which I don't want. The problem here is that the
> "grouping"
> > is lost. I can create fields with compound names (E1Language, E1Years,
> > E2Language, E2Years, etc), but that helps me none, as I don't know which
> > group to search. I'd also like to query for "(Language:Java AND Years:5)
> OR
> > (Language:C AND Years:2)"
> >
> > This is a simplified example. Real documents may have 30 - 40 groups,
> each
> > one with several fields. Putting all the fields in a group in one index
> > field won't work as the numeric/date ones should be available for range
> > searchers.
> >
> > So far the way I see it is to do my own post processing on the results.
> The
> > issue is that text fields will need to be untokenized, or otherwise it
> > would
> > be difficult to work on the result, and determine what matches.
> >
> > Thank you.
> > tjk :)
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message