lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: solr-suggestion - terms that "start with"...
Date Wed, 17 May 2006 02:47:48 GMT

: I've just improved the code to be a better DocSet citizen and it now
: does this:
:
: 	      BitDocSet constraintDocSet = new BitDocSet(constraintMask);
:                ...
:                map.put(term.text(), docSet.intersectionSize
: (constraintDocSet));

how are you building constraintMask come from? ... if it's a BitSet you
are building up by executing a bunch of queries, getting their DocSets,
asking those DocSets for their bits, and then unioning/interescting them
then that's probably the best place where there's likely to be benefit
from Solr that you aren't taking advantage of already (except that i seem
to recall you wanting to do things that DocSets don't currently support:
like invert .. so maybe this is hte best way)

Of the cuff: the one thing i would do differnetly if it were me, is...

  BitDocSet constraintDocSet = new BitDocSet(constraintMask);
  ...
  if (term != null && term.field().equals(facet) && term.text().startsWith(prefix))
{
     map.put(term.text(), searcher.numDocs(new TermQuery(term),
                                           constraintDocSet);
  } else {
  ...

...there's no performacne gain, but it makes your code a little cleaner.


As for issue of how you get the values based on your prefix, i would keep
using a TermEnum, but build it on a field that isn't tokenized.  with a
copyField this becomes really easy, if this is your current "agent"
field...

  <fieldtype name="text" class="solr.TextField" ...
  <field name="agent" type="text" indexed="true" stored="true"
                      multiValued="true"/>

..then assuming you already have...
  <fieldtype name="string" class="solr.StrField" />
...add...
  <field name="agentRAW" type="string" indexed="true" stored="false"
                         omitNorms="true" multiValued="true" />
...and farther down...
  <copyField source="agent" dest="agentRAW"/>

...and then make your TermEnum on the "agentRAW" field.

: Oh, one other wrinkle to getting the stored field value is that the
: agent field is multi-valued, so several people could collaborate and
: have their individual names associated with a work.  So there are

this won't be a problem with the multiValued="true" option ... it does
what you expect regardless of wether the field is text,string,integer,
tokenized/non-tokenized.

(well, it does what *I* expect ... if you exepct something and it doesn't
do that -- let us know)


-Hoss


Mime
View raw message