lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Estrada <estrada.a...@gmail.com>
Subject Multiple Word Facets
Date Wed, 27 Oct 2010 01:43:36 GMT
All,
I am a new to Solr faceting and stuck on how to get multiple-word
facets returned from a standard Solr query. See below for what is
currently being returned.

<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="title">
<int name="Federal">89</int>
<int name="EFLHD">87</int>
<int name="Eastern">87</int>
<int name="Lands">87</int>
<int name="Highways">84</int>
<int name="FHWA">60</int>
<int name="Transportation">32</int>
<int name="GIS">22</int>
<int name="Planning">19</int>
<int name="Asset">15</int>
<int name="Environment">15</int>
<int name="Management">14</int>
<int name="Realty">12</int>
<int name="Highway">11</int>
<int name="HEP">10</int>
<int name="Program">9</int>
<int name="HEPGIS">7</int>
<int name="Resources">7</int>
<int name="Roads">7</int>
<int name="EEI">6</int>
<int name="Environmental">6</int>
<int name="Right">6</int>
<int name="Way">6</int>
...etc...

There are many terms in there that are 2 or 3 word phrases. For
example, Eastern Federal Lands Highway Division all gets broken down
in to the individual words that make up the total group of words. I've
seen quite a few websites that do what it is I am trying to do here so
any suggestions at this point would be great. See my schema below
(copied from the example schema).

    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
	<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="false"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
	<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>

Similar for type="query". Please advise on how to group or cluster
document terms so that they can be used as facets.

Many thanks in advance,
Adam Estrada

Mime
View raw message