lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Increase number of available positions?
Date Mon, 15 Mar 2010 22:02:50 GMT
Not quite what I had in mind, more like
level1-1/level2-1/level3-1/Term1 level1-1/level2-1/level3-1/Term2
level1-1/level2-1/level3-2/Term3 level1-1/level2-1/level3-2/Term4

With an increment gap 0f 100 and an analyzer that split on slashes, the term
positions would be
something like:

term   term
pos
0        level1-1
1        level2-1
2        level3-1
3        Term1
104    level1-1
105    level2-1
106    level3-1
107    Term2
208     level1-1
209     level2-1
210     level3-2
211     Term3
312     level1-1
313     level2-1
314     level3-2
315     Term4

As you see, a lot or repetition, but perhaps acceptable...

Or, you could choose an analyzer that didn't break up the terms
(although this would make your index somewhat bigger due to
more unique terms).
term           term
pos
0          level1-1/level2-1/level3-1/Term1
101      level1-1/level2-1/level3-1/Term2
202      level1-1/level2-1/level3-2/Term3
303      level1-1/level2-1/level3-2/Term4

Although I don't know if you really need an increment gap here.....

This latter would make gathering all the documents with specific levels
easier although the former would also work if you didn't need partial
terms (that is, wildcards inside of phrases are new, see
JIRA-1486, ComplexPhraseQueryParser).

Best
Erick

On Mon, Mar 15, 2010 at 5:09 PM, Rene Hackl-Sommer <rene.a.hackl@gmx.de>wrote:

> Hi Erick,
>
>> What about indexing
>> the triplets with a small increment gap between? That is:
>> ...
>>
>> gets indexed as:
>>
>> level1-1/level2-1/level3-1  +gap 100
>> level1-1/level2-1/level3-2  +gap 100
>> level1-1/level2-2/level3-3  +gap 100
>> level1-1/level2-2/level3-4
>>
>>
>
> If I understand this correctly, the field would look like
> "level1-1/level2-1/level3-1 Term1 Term2 level1-1/level2-1/level3-2 Term3
> Term4 "?
>
> I think, the problem here is the same like in the Payloads approach I wrote
> of in my response to Steve's mail. We cannot test for equality at search
> time (please correct me if we actually can do this). So if we have
>
>
> level1-1/level2-1/level3-1
> ...
> level1-1/level2-1/level3-244
> level1-1/level2-2/level3-1
> level1-1/level2-2/level3-105
>
> and I search for T1 and T2 on level3, but want them to be in the same
> level2, this cannot be done satisfactorily.
>
>
>  Or you could think about *documents* being your level1, that is each
>> document has one and only one level1 element but many documents
>> may have the same level1 token. Combining this with your increment
>> gap notion for level2-3 might work for you.
>>
>>
>
> I was thinking about this, yet the trouble is that the issue at hand is
> just one field in an already not quite trivial scenario involving 200+
> fields. If I add say 50 level1-documents per real document, I would still
> need to be able to relate these level1-documents to the real documents to
> which they belong, and, during retrieval, there are use cases where I need
> to look into each of the level1-documents to see if they fulfill certain
> criteria and then, in a further step, ascertain whether I can gather the
> needed level1-documents to fulfill the query on a "MyField"-Level (not
> existant here per se). I feel this might get somewhat unwieldy.
>
>
>  You might also search the list for "Heirarchal" or "tree" indexing,
>> this is a variant of such I think.
>>
>>
>
> Thank you, I'll look into this.
>
>
> Cheers
> Rene
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message