lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Which tokenizer or analizer should use and field type
Date Fri, 12 Apr 2013 21:49:34 GMT
Unfortunately, Solr doesn't have a query parser that would give the meaning 
you want to:

project assistant,manager

For now, you would need to write that query as:

(project AND assistant) OR manager

Or maybe as:

"project assistant"~5 OR manager

That would require project and assistant to occur with a few words of each 
other.

Or, if you have q.op defaulted to "OR":

"project assistant"~5 manager

Add the HTML strip char filter to your text field type:

<charFilter class="solr.HTMLStripCharFilterFactory" />

text_general is a semi-decent place to start.

-- Jack Krupansky

-----Original Message----- 
From: anurag.jain
Sent: Friday, April 12, 2013 11:32 AM
To: solr-user@lucene.apache.org
Subject: Which tokenizer or analizer should use and field type

my schema file is :

<copyField source="title" dest ="keyword"/>
<copyField source="body" dest ="keyword"/>
<copyField source="company_name" dest="keyword"/>
<copyField source="company_profile" dest="keyword"/>

<field name="title" type="text_general" indexed="true" stored="true"/>
<field name="body" type="text_general" indexed="true" stored="true"/>
<field name="company_name" type="text_general" indexed="true"
stored="true"/>
<field name="company_profile" type="text_general" indexed="true"
stored="true"/>

<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />

        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>





values are like,

title: "Assistant Coach/ Junior Assistant"
body: "<p> <http://i.imgur.com/buPga.jpg> <br /><br />Oil India Ltd.
invites
applications for the post of <strong>Sr Medical Officer (Paediatrics)
</strong><br /> www.freshersworld.com<br /> <strong>Qualification</strong>
:
MD (Paediatrics) <br /><br /> <strong>No of Post</strong> : 1UR<br
/> <br
/><strong> Pay Scale</strong> : Rs 32900 -58000 <br /> <br /> <strong>Age
as
on 11.04.2013</strong> : 32 yrs<br /> </p><p><strong>Selection
Procedure :
</strong>Selection for the above post will be based on Written Test, Group
Discussion (GD), Viva-Voce and Medical Examination.<br /> </p>"

company_profile: "<p>The story of <strong>Oil India Limited (OIL)</strong>
traces and symbolises the development and growth of the Indian petroleum
industry. From the discovery of crude oil in the far east of India at
Digboi, Assam in 1889 to its present status as a fully integrated upstream
petroleum company, OIL has come far, crossing many milestones.</p>",

company_name: "Oil India Limited",



please give me suggestion about field type i should use.

keyword is copyfield i am using for search. i do not want to search on html
content.

How search will happen ?


if i give words to search

project assistant,manager


it only should give me keyword have project assistance or manager.

right now it is giving me results which has project or assistance or manager
that is wrong case for me.

Please give me solution for it. I have to complete that task by today thats
why i am not able to do research on it.


need field type definitions for each field. and how search query i'll write
??

thanks in advance






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Which-tokenizer-or-analizer-should-use-and-field-type-tp4055591.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Mime
View raw message