lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Advice on Custom Sorting
Date Mon, 25 Sep 2006 20:19:41 GMT
OK, a really "off the top of my head" response, but what the heck....

I'm not sure you need to worry about filters. Would it work for you to index
the documents in your preferred list with a  field (called, at the limit of
my creativity, preferredsubid <G>) and index your non-preferred docs with a
field subid? You'd still have to fire two queries, one on subid (to pick up
the ones in your non-preferred list) and one on preferredsubid.

Since there's no requirement that all docs have the same fields, your
preferred docs could have ONLY the preferredsubid field and your
non-preferred docs ONLY the subid field. That way you wouldn't have to worry
about picking the docs up twice.

Merging should be simple then, just iterate over however many hits you want
in your preferredHits object, then tack on however many you want from your
nonPreferredHits object. All the code for the two queries would be
identical, the only difference being whether you specify "subid" or
"preferredsubid"......

I can imagine several variations on this scenario, but they depend on your
problem space.

Whether this is the "best" or not, I leave as an exercise for the reader.

Best
Erick

On 9/25/06, Paul Lynch <pablolynch@yahoo.com> wrote:
>
> Hi All,
>
> I have an index containing documents which all have a
> field called SubId which holds the ID of the
> Subscriber that submitted the data. This field is
> STORED and UN_TOKENIZED
>
> When I am querying the index, the user can cloose a
> number of different ways to sort the Hits. The problem
> is that I have a list of SubIds that should appear at
> the top of the results list regardless of how the
> index is sorted. In other words, lets suppose the Hits
> should be sorted by DateAdded, I require the Hits to
> be sorted by DateAdded for the SubIds in my list and
> then by DateAdded for the SubIds not in my list.
>
> From reading previous discussions on the mailing list,
> I believe I could achieve what I need by writing
> custom filters i.e. Run the query first with a custom
> filter for the SubIds in my list and then a second
> time with a custom filter for the SubIds not in my
> list and then "merge" the results.
>
> I suppose my question is simple: Is there a better way
> to achieve this?
>
> Couple of bits of info which I would influence best
> design:
>
> - Index contains roughly 5M documents
> - There can be up to 10K different unique SubIds
> - My "Preferred SubId List" could contain any
> combination of the 10K SubIds including all or none of
> them
> - My "Preferred SubId List" gets updated about 10
> times and hour so I could cache the custom filters
>
> Thanks in advance,
> Paul
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message