lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: simple question from a newbie
Date Thu, 29 Jul 2010 00:17:07 GMT
What is the query you submit (don't forget &debugQuery=on"? In particular,
what
field are you sorting on?

But yes, if you're searching on a tokenized field, you'll get matches on all
tokens
in that field. Which are probably single words. And no matter how you sort,
you're
still getting documents where the whole title doesn't start with "c" in your
title.

What happens if you search on your dc3.title instead? It uses the keyword
tokenizer
which tokenizes the entire title as a single token. Sort by that one too.

Best
Erick

On Wed, Jul 28, 2010 at 12:26 PM, Nguyen, Vincent (CDC/OSELS/NCPHI) (CTR) <
vng0@cdc.gov> wrote:

> I think I got it to work.  If I do a wildcard search using the dc3.title
> field it seems to work fine (dc3.title:c*).  The dc.title:c* returns
> every title that has a word in it that starts with 'c', which isn't
> exactly what I wanted.  I'm guessing it's because of the
> type="caseInsensitiveSort".
>
> Well, here is my schema for reference.  Thanks for your help.
>
>
> - <schema name="example" version="1.1">
> - <types>
>   <fieldType name="string" class="solr.StrField" sortMissingLast="true"
> omitNorms="true" />
> - <!--  boolean type: "true" or "false"
>  -->
>   <fieldType name="boolean" class="solr.BoolField"
> sortMissingLast="true" omitNorms="true" />
>   <fieldType name="integer" class="solr.IntField" omitNorms="true" />
>  <fieldType name="long" class="solr.LongField" omitNorms="true" />
>  <fieldType name="float" class="solr.FloatField" omitNorms="true" />
>  <fieldType name="double" class="solr.DoubleField" omitNorms="true" />
>   <fieldType name="sint" class="solr.SortableIntField"
> sortMissingLast="true" omitNorms="true" />
>  <fieldType name="slong" class="solr.SortableLongField"
> sortMissingLast="true" omitNorms="true" />
>  <fieldType name="sfloat" class="solr.SortableFloatField"
> sortMissingLast="true" omitNorms="true" />
>  <fieldType name="sdouble" class="solr.SortableDoubleField"
> sortMissingLast="true" omitNorms="true" />
>   <fieldType name="date" class="solr.DateField" sortMissingLast="true"
> omitNorms="true" />
> - <fieldType name="text_ws" class="solr.TextField"
> positionIncrementGap="100">
> - <analyzer>
>  <tokenizer class="solr.WhitespaceTokenizerFactory" />
>  </analyzer>
>  </fieldType>
> - <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
> - <analyzer type="index">
>   <tokenizer class="solr.WhitespaceTokenizerFactory" />
>  <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>   <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" />
>  <filter class="solr.LowerCaseFilterFactory" />
>  <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt" />
>  <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>   </analyzer>
> - <analyzer type="query">
>  <tokenizer class="solr.WhitespaceTokenizerFactory" />
>  <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true" />
>  <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>   <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" />
>  <filter class="solr.LowerCaseFilterFactory" />
>  <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt" />
>   <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>  </analyzer>
>  </fieldType>
> - <fieldType name="textTight" class="solr.TextField"
> positionIncrementGap="100">
> - <analyzer>
>  <tokenizer class="solr.WhitespaceTokenizerFactory" />
>  <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="false" />
>  <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>  <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0"
> generateNumberParts="0" catenateWords="1" catenateNumbers="1"
> catenateAll="0" />
>  <filter class="solr.LowerCaseFilterFactory" />
>   <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt" />
>   <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>  </analyzer>
>  </fieldType>
> - <fieldType name="caseInsensitiveSort" class="solr.TextField"
> sortMissingLast="true" omitNorms="true">
> - <analyzer>
>  <tokenizer class="solr.KeywordTokenizerFactory" />
>  <filter class="solr.LowerCaseFilterFactory" />
>  <filter class="solr.TrimFilterFactory" />
>   </analyzer>
>  </fieldType>
>  <fieldtype name="ignored" stored="false" indexed="false"
> class="solr.StrField" />
>  </types>
> - <fields>
> - <!--  Fedora specific fields
>  -->
>  <field name="PID" type="string" indexed="true" stored="true" />
>  <field name="fgs.state" type="string" indexed="true" stored="true" />
>  <field name="fgs.label" type="text" indexed="true" stored="true" />
>  <field name="fgs.ownerId" type="string" indexed="true" stored="true"
> />
>  <field name="fgs.createdDate" type="date" indexed="true" stored="true"
> />
>  <field name="fgs.lastModifiedDate" type="date" indexed="true"
> stored="true" />
>  <field name="fgs.contentModel" type="string" indexed="true"
> stored="true" />
>  <field name="fgs.type" type="string" indexed="true" stored="true"
> multiValued="true" />
> - <!--  DC Fields
>  -->
>  <field name="dc.contributor" type="text" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc.coverage" type="text" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc.creator" type="text" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc.date" type="text" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc.description" type="text" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc.format" type="text" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc.identifier" type="text" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc.language" type="text" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc.publisher" type="text" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc.relation" type="text" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc.rights" type="text" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc.source" type="text" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc.subject" type="text" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc.title" type="text" indexed="true" stored="true"
> multiValued="true" omitnorms="true" />
>  <field name="dc.type" type="text" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc2.contributor" type="string" indexed="true"
> stored="true" multiValued="true" />
>  <field name="dc2.coverage" type="string" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc2.creator" type="string" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc2.date" type="string" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc2.description" type="string" indexed="true"
> stored="true" multiValued="true" />
>  <field name="dc2.format" type="string" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc2.identifier" type="string" indexed="true"
> stored="true" multiValued="true" />
>  <field name="dc2.language" type="string" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc2.publisher" type="string" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc2.relation" type="string" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc2.rights" type="string" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc2.source" type="string" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc2.subject" type="string" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc2.title" type="string" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc2.type" type="string" indexed="true" stored="true"
> multiValued="true" />
>  <field name="dc3.creator" type="caseInsensitiveSort" indexed="true"
> stored="true" multiValued="true" />
>  <field name="dc3.subject" type="caseInsensitiveSort" indexed="true"
> stored="true" multiValued="true" />
>  <field name="dc3.title" type="caseInsensitiveSort" indexed="true"
> stored="true" multiValued="true" />
>  <field name="timestamp" type="date" indexed="true" stored="true"
> default="NOW" multiValued="false" />
>  <dynamicField name="*_i" type="sint" indexed="true" stored="true" />
>   <dynamicField name="*_s" type="string" indexed="true" stored="true" />
>
>   <dynamicField name="*_l" type="slong" indexed="true" stored="true" />
>   <dynamicField name="*_t" type="text" indexed="true" stored="true" />
>  <dynamicField name="*_b" type="boolean" indexed="true" stored="true"
> />
>   <dynamicField name="*_f" type="sfloat" indexed="true" stored="true" />
>
>  <dynamicField name="*_d" type="sdouble" indexed="true" stored="true"
> />
>  <dynamicField name="*_dt" type="date" indexed="true" stored="true" />
>   <dynamicField name="fgs.*" type="text" indexed="true" stored="true"
> multiValued="true" />
>  <dynamicField name="dsm.*" type="text" indexed="true" stored="true"
> multiValued="true" />
>  <dynamicField name="rdf.*" type="text" indexed="true" stored="true"
> multiValued="true" />
>  </fields>
>  <uniqueKey>PID</uniqueKey>
>  <defaultSearchField>fgs.label</defaultSearchField>
>  <solrQueryParser defaultOperator="OR" />
>  <copyField source="dc.contributor" dest="dc2.contributor" />
>  <copyField source="dc.coverage" dest="dc2.coverage" />
>  <copyField source="dc.creator" dest="dc2.creator" />
>  <copyField source="dc.date" dest="dc2.date" />
>  <copyField source="dc.description" dest="dc2.description" />
>  <copyField source="dc.format" dest="dc2.format" />
>  <copyField source="dc.identifier" dest="dc2.identifier" />
>  <copyField source="dc.language" dest="dc2.language" />
>  <copyField source="dc.publisher" dest="dc2.publisher" />
>  <copyField source="dc.relation" dest="dc2.relation" />
>  <copyField source="dc.rights" dest="dc2.rights" />
>  <copyField source="dc.source" dest="dc2.source" />
>  <copyField source="dc.subject" dest="dc2.subject" />
>  <copyField source="dc.title" dest="dc2.title" />
>  <copyField source="dc.type" dest="dc2.type" />
>  <copyField source="dc.subject" dest="dc3.subject" />
>  <copyField source="dc.title" dest="dc3.title" />
>  <copyField source="dc.creator" dest="dc3.creator" />
>  </schema>
>
> Vincent Vu Nguyen
> Division of Science Quality and Translation
> Office of the Associate Director for Science
> Centers for Disease Control and Prevention (CDC)
> 404-498-6154
> Century Bldg 2400
> Atlanta, GA 30329
>
>
> -----Original Message-----
> From: Ranveer [mailto:ranveer.solr@gmail.com]
> Sent: Wednesday, July 28, 2010 11:31 AM
> To: solr-user@lucene.apache.org
> Subject: Re: simple question from a newbie
>
> I think you using wild-card search or should use wild-card search. but
> first of all please provide the schema and configuration file for more
> details.
>
> regards
> Ranveer
>
>
> On Wednesday 28 July 2010 07:51 PM, Nguyen, Vincent (CDC/OSELS/NCPHI)
> (CTR) wrote:
> > Hi,
> >
> >
> >
> > I'm new to Solr and have a rather dumb question.  I want to do a query
> > that returns all the Titles that start with a certain letter.  For
> > example
> >
> >
> >
> > I have these titles:
> >
> > Results of in-mine research in support
> >
> > Cancer Reports
> >
> > State injury indicators report
> >
> > Cancer Reports
> >
> > Indexed dermal bibliography
> >
> > Childhood agricultural-related injury report
> >
> > Childhood agricultural injury prevention
> >
> >
> >
> >
> >
> > I want the query to return:
> >
> > Cancer Reports
> >
> > Cancer Reports
> >
> > Childhood agricultural-related injury report
> >
> > Childhood agricultural injury prevention
> >
> >
> >
> > I want something like dc.title=c* type query
> >
> >
> >
> > I know that I can facet by dc.title and then use the parameter
> > facet.prefix=c but it returns something like this:
> >
> > Cancer Reports [2]
> >
> > Childhood agricultural-related injury report [1]
> >
> > Childhood agricultural injury prevention [1]
> >
> >
> >
> >
> >
> > Vincent Vu Nguyen
> > Division of Science Quality and Translation
> >
> > Office of the Associate Director for Science
> > Centers for Disease Control and Prevention (CDC)
> > 404-498-6154
> > Century Bldg 2400
> > Atlanta, GA 30329
> >
> >
> >
> >
> >
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message