lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From TJ Kolev <tjko...@gmail.com>
Subject Re: Problem: Indexing and searching repeating groups of fields
Date Fri, 15 Jan 2010 17:12:57 GMT
Found public int getPositionIncrementGap(String fieldName) on Analyzer.
Sweet! Should've read more before emailing.

tjk :)

On Fri, Jan 15, 2010 at 10:19 AM, TJ Kolev <tjkolev@gmail.com> wrote:

> Hi!
>
> I don't think the easy solution will work for me, because I'll have more
> than two fields in a group - perhaps 6 - 10.
>
> However using span queries looks very promising. I'll investigate that.
>
> I see setPositionIncrement() only on the Token object. Is there a way to
> set this when adding a field to the document, so that the first token get
> its position pushed away. I would prefer not to modify my analyzer if
> possible.
>
> Thank you.
> tjk :)
>
>
> On Wed, Jan 13, 2010 at 3:52 PM, Erick Erickson <erickerickson@gmail.com>wrote:
>
>> Ooooh, isn't that easier. You just prompted me to think
>> that you don't even have to do that, just index the pairs as single
>> tokens (KeywordAnalyzer? but watch out for no case folding)...
>>
>> On Wed, Jan 13, 2010 at 4:30 PM, Digy <digydigy@gmail.com> wrote:
>>
>> > How about using languages as fieldnames?
>> > Doc1(Ra):
>> >        Java:5
>> >        C:2
>> >        PHP:3
>> >
>> > Doc2(Rb)
>> >        Java:2
>> >        C:5
>> >        VB:1
>> >
>> > Query:Java:5 AND C:2
>> >
>> > DIGY
>> >
>> > -----Original Message-----
>> > From: TJ Kolev [mailto:tjkolev@gmail.com]
>> > Sent: Wednesday, January 13, 2010 11:00 PM
>> > To: java-user@lucene.apache.org
>> > Subject: Problem: Indexing and searching repeating groups of fields
>> >
>> > Greetings,
>> >
>> > Let's assume I have to index and search "resume" documents. Two fields
>> are
>> > defined: Language and Years. The fields are associated together in a
>> group
>> > called Experience. A resume document may have 0 or more Experience
>> groups:
>> >
>> > Ra{ E1(Java,5); E2(C,2); E3(PHP,3);}
>> > Rb{ E1(Java,2); E2(C,5); E3(VB,1);}
>> >
>> > How do I index such documents, and how do I search, so I can formulate a
>> > query like this "Resumes which have (Java,5) and (C,2)" and get back Ra.
>> I
>> > know I can index multiple fields of the same name, and do
>> "(Language:Java
>> > AND Years:5) AND (Language:C AND Years:2)", but in addition to Ra that
>> > would
>> > also return Rb, which I don't want. The problem here is that the
>> "grouping"
>> > is lost. I can create fields with compound names (E1Language, E1Years,
>> > E2Language, E2Years, etc), but that helps me none, as I don't know which
>> > group to search. I'd also like to query for "(Language:Java AND Years:5)
>> OR
>> > (Language:C AND Years:2)"
>> >
>> > This is a simplified example. Real documents may have 30 - 40 groups,
>> each
>> > one with several fields. Putting all the fields in a group in one index
>> > field won't work as the numeric/date ones should be available for range
>> > searchers.
>> >
>> > So far the way I see it is to do my own post processing on the results.
>> The
>> > issue is that text fields will need to be untokenized, or otherwise it
>> > would
>> > be difficult to work on the result, and determine what matches.
>> >
>> > Thank you.
>> > tjk :)
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message