lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Lynch <pabloly...@yahoo.com>
Subject Re: Advice on Custom Sorting
Date Tue, 26 Sep 2006 16:35:59 GMT
Thanks again Erick for taking the time.

I agree that the CachingWrapperFilter as described
under "using a custom filter" in LIA is probably my
best bet. I wanted to check if anything had been added
in Lucene releases since the book was written I wasn't
aware of.

Cheers again.

--- Erick Erickson <erickerickson@gmail.com> wrote:

> You were probably right. See below....
> 
> On 9/25/06, Paul Lynch <pablolynch@yahoo.com> wrote:
> >
> > Thanks for the quick response Erick.
> >
> > "index the documents in your preferred list with a
> > field and index your non-preferred docs with a
> field
> > subid?"
> >
> > I considered this approach and dismissed it due to
> the
> > actual list of preferred ids changing so
> frequently
> > (every 10 mins...ish) but maybe I was a little
> hasty
> > in doing so. I will investigate the overhead in
> > updating all docs in the index each time my list
> > refreshes. I had assumed it was too prohibitive
> but I
> > know what they say about assumptions :)
> 
> 
> Lots of overhead. There's really no capability of
> updating a doc in place.
> This has been on several people's wish-list. You'd
> have to delete every doc
> that you wanted to change and re-add it. I don't
> know how many documents
> this would be, if just a few it'd be OK, but if
> many.... I was assuming (and
> I *do* know what they say about assumptions <G>)
> that you were just adding
> to your preferred doc list every few minutes, not
> changing existing
> documents....
> 
> It really does sound like you want a filter. I was
> pleasantly surprised by
> how very quickly a filters are built. You could use
> a CachingWrapperFilter
> to have the filter kept around automatically (I
> guess you'd only have one
> per index update) to minimize your overhead for
> building filters, and
> perhaps warm up your cache by firing a canned query
> at your searcher when
> you re-open your IndexReader after index update. I
> think you'd have to do
> the two-query thing in this case. If you wanted to
> really get exotic, you
> could build your filter when you created your index
> and store it in a *very
> special document* and just read it in the first time
> you needed it. Although
> I've never used it, I guess you can store binary
> data. From the Javadoc
> 
>
*Field<file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/document/Field.html#Field%28java.lang.String,%20byte%5B%5D,%20org.apache.lucene.document.Field.Store%29>
> *(String
>
<http://java.sun.com/j2se/1.4/docs/api/java/lang/String.html>
> name,
> byte[] value,
>
Field.Store<file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/document/Field.Store.html>
>  store)
>           Create a stored field with binary value.
> 
> The only thing here is that the filters (probably
> wrapped in a
> ConstantScoreQuery) lose relevance, but since you're
> sorting "one of several
> ways", that probably doesn't matter.
> 
> Best
> Erick
> 
> 
> 
> Should I be able to make this workable, the beauty
> of
> > this solution would be that I would actually only
> need
> > to query once. If I had a field which indicates
> > whether it is a preferred doc or not, "all" I will
> > have to do is sort across the two fields.
> >
> > Thanks again Erick. Any other suggestions are most
> > welcome.
> >
> > Regards,
> > Paul
> >
> > --- Erick Erickson <erickerickson@gmail.com>
> wrote:
> >
> > > OK, a really "off the top of my head" response,
> but
> > > what the heck....
> > >
> > > I'm not sure you need to worry about filters.
> Would
> > > it work for you to index
> > > the documents in your preferred list with a 
> field
> > > (called, at the limit of
> > > my creativity, preferredsubid <G>) and index
> your
> > > non-preferred docs with a
> > > field subid? You'd still have to fire two
> queries,
> > > one on subid (to pick up
> > > the ones in your non-preferred list) and one on
> > > preferredsubid.
> > >
> > > Since there's no requirement that all docs have
> the
> > > same fields, your
> > > preferred docs could have ONLY the
> preferredsubid
> > > field and your
> > > non-preferred docs ONLY the subid field. That
> way
> > > you wouldn't have to worry
> > > about picking the docs up twice.
> > >
> > > Merging should be simple then, just iterate over
> > > however many hits you want
> > > in your preferredHits object, then tack on
> however
> > > many you want from your
> > > nonPreferredHits object. All the code for the
> two
> > > queries would be
> > > identical, the only difference being whether you
> > > specify "subid" or
> > > "preferredsubid"......
> > >
> > > I can imagine several variations on this
> scenario,
> > > but they depend on your
> > > problem space.
> > >
> > > Whether this is the "best" or not, I leave as an
> > > exercise for the reader.
> > >
> > > Best
> > > Erick
> > >
> > > On 9/25/06, Paul Lynch <pablolynch@yahoo.com>
> wrote:
> > > >
> > > > Hi All,
> > > >
> > > > I have an index containing documents which all
> > > have a
> > > > field called SubId which holds the ID of the
> > > > Subscriber that submitted the data. This field
> is
> > > > STORED and UN_TOKENIZED
> > > >
> > > > When I am querying the index, the user can
> cloose
> > > a
> > > > number of different ways to sort the Hits. The
> > > problem
> > > > is that I have a list of SubIds that should
> appear
> > > at
> > > > the top of the results list regardless of how
> the
> > > > index is sorted. In other words, lets suppose
> the
> > > Hits
> > > > should be sorted by DateAdded, I require the
> Hits
> > > to
> > > > be sorted by DateAdded for the SubIds in my
> list
> > > and
> > > > then by DateAdded for the SubIds not in my
> list.
> > > >
> > > > From reading previous discussions on the
> mailing
> > > list,
> > > > I believe I could achieve what I need by
> writing
> > > > custom filters i.e. Run the query first with a
> > > custom
> > > > filter for the SubIds in my list and then a
> second
> > > > time with a custom filter for the SubIds not
> in my
> > > > list and then "merge" the results.
> > > >
> > > > I suppose my question is simple: Is there a
> better
> > > way
> > > > to achieve this?
> > > >
> 
=== message truncated ===


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message