lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: When search term has two stopwords ('and' and 'a') together, it doesn't work
Date Fri, 08 Nov 2019 16:55:19 GMT
But when you change it to AND, a single misspelling means zero results. That is usually not
helpful.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Nov 8, 2019, at 10:43 AM, David Hastings <hastings.recursive@gmail.com> wrote:
> 
> is your default operator OR?
> change it to AND
> 
> 
> On Fri, Nov 8, 2019 at 11:30 AM Guilherme Viteri <gviteri@ebi.ac.uk> wrote:
> 
>> HI Walter and Paras
>> 
>> I indexed it removing all the references to StopWordFilter and I went from
>> 121 results to near 20K as the search term q="Lymphoid and a non-Lymphoid
>> cell" is matching entities such as "IFT A" or  "Lamin A". So I don't think
>> removing it completely is the way to go from the scenario we have, but I
>> appreciate the suggestion...
>> 
>> Yes the response is using fl=*
>> I am trying some combinations at the moment, but yet no success.
>> 
>> defType=edismax
>> q.alt=Lymphoid and a non-Lymphoid cell
>> Number of results=1599
>> Quite a considerable increase, even though reasonable meaningful results.
>> 
>> I am sorry but I didn't understand what do you want me to do exactly with
>> the lst (??) and qf and bf.
>> 
>> Thanks everyone with their inputs
>> 
>> 
>>> On 8 Nov 2019, at 06:45, Paras Lehana <paras.lehana@indiamart.com>
>> wrote:
>>> 
>>> Hi Guilherme
>>> 
>>> By accident, I ended up querying the using the default handler (/select)
>> and it worked.
>>> 
>>> You've just found the culprit. Thanks for giving the material I
>> requested. Your analysis chain is working as expected. I don't see any
>> issue in either StopWordFilter or your boosts. I also use a boost of 50
>> when boosting contextual suggestions (boosting "gold iphone" on a page of
>> iphone) but I take Walter's suggestion and would try to optimize my
>> weights. I agree that this 50 thing was not researched much about by us as
>> well (we never faced performance or relevance issues).
>>> 
>>> See the major difference in both the handlers - edismax. I'm pretty sure
>> that your problem lies in the parsing of queries (you can confirm that from
>> parsedquery key in debug of both JSON responses). I hope you have provided
>> the response with fl=*. Replace q with q.alt in your /search handler query
>> and I think you should start getting responses. That's because q.alt uses
>> standard parser. If you want to keep using edisMax, I suggest you to test
>> the responses removing some combination of lst (qf, bf) and find what's
>> restricting the documents to come up. I'm out of office today - would have
>> certainly tried analyzing the field values of the document in /select
>> request and compare it with qf/bq in solrconfig.xml /search. Do this for me
>> and you'd certainly find something.
>>> 
>>> On Thu, 7 Nov 2019 at 21:00, Walter Underwood <wunder@wunderwood.org
>> <mailto:wunder@wunderwood.org>> wrote:
>>> I normally use a weight of 8 for the most important field, like title.
>> Other fields might get a 4 or 2.
>>> 
>>> I add a “pf” field with the weights doubled, so that phrase matches have
>> a higher weight.
>>> 
>>> The weight of 8 comes from experience at Infoseek and Inktomi, two early
>> web search engines. With different relevance algorithms and totally
>> different evaluation and tuning systems, they settled on weights of 8 and
>> 7.5 for HTML titles. With the the two radically different system getting
>> the same number, I decided that was a property of the documents, not of the
>> search engines.
>>> 
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org <mailto:wunder@wunderwood.org>
>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>  (my
>> blog)
>>> 
>>>> On Nov 7, 2019, at 9:03 AM, Guilherme Viteri <gviteri@ebi.ac.uk
>> <mailto:gviteri@ebi.ac.uk>> wrote:
>>>> 
>>>> Hi Wunder,
>>>> 
>>>> My indexer takes quite a few hours to be executed I am shortening it to
>> run faster, but I also need to make sure it gives what we are expecting.
>> This implementation's been there for >4y, and massively used.
>>>> 
>>>>> In your edismax handlers, weights of 20, 50, and 100 are extremely
>> high. I don’t think I’ve ever used a weight higher than 16 in a dozen years
>> of configuring Solr.
>>>> I've inherited that implementation and I am really keen to adequate it,
>> what would you recommend ?
>>>> 
>>>> Cheers
>>>> Guilherme
>>>> 
>>>>> On 7 Nov 2019, at 14:43, Walter Underwood <wunder@wunderwood.org
>> <mailto:wunder@wunderwood.org>> wrote:
>>>>> 
>>>>> Thanks for posting the files. Looking at schema.xml, I see that you
>> still are using StopFilterFactory. The first advice we gave you was to
>> remove that.
>>>>> 
>>>>> Remove StopFilterFactory everywhere and reindex.
>>>>> 
>>>>> You will continue to have problems matching stopwords until you do
>> that.
>>>>> 
>>>>> In your edismax handlers, weights of 20, 50, and 100 are extremely
>> high. I don’t think I’ve ever used a weight higher than 16 in a dozen years
>> of configuring Solr.
>>>>> 
>>>>> wunder
>>>>> Walter Underwood
>>>>> wunder@wunderwood.org <mailto:wunder@wunderwood.org>
>>>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>
>> (my blog)
>>>>> 
>>>>>> On Nov 7, 2019, at 6:56 AM, Guilherme Viteri <gviteri@ebi.ac.uk
>> <mailto:gviteri@ebi.ac.uk>> wrote:
>>>>>> 
>>>>>> Hi Paras, everyone
>>>>>> 
>>>>>> Thank you again for your inputs and suggestions. I sorry to hear
you
>> had trouble with the attachments I will host it somewhere and share the
>> links.
>>>>>> I don't tweak my index, I get the data from the graph database,
>> create a document as they are and save to solr.
>>>>>> 
>>>>>> So, I am sending the new analysis screen querying the way you
>> suggested. Also the results with params and solr query url.
>>>>>> 
>>>>>> During the process of querying what you asked I found something
>> really weird (at least for me). By accident, I ended up querying the using
>> the default handler (/select) and it worked. Then If I use the one I must
>> use, then sadly doesn't work. I am posting both results and I will also
>> post the handlers as well.
>>>>>> 
>>>>>> Here is the link with all the files mentioned before
>>>>>> 
>> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0
>> <https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0>
>> <https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0
>> <https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0
>>>> 
>>>>>> If the link doesn't work www dot dropbox dot com slash sh slash
>> fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a ? dl equals 0
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>>> On 7 Nov 2019, at 05:23, Paras Lehana <paras.lehana@indiamart.com
>> <mailto:paras.lehana@indiamart.com>> wrote:
>>>>>>> 
>>>>>>> Hi Guilherme.
>>>>>>> 
>>>>>>> I am sending they analysis result and the json result as requested.
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks for the effort. Luckily, I can see your attachments (low
>> quality
>>>>>>> though).
>>>>>>> 
>>>>>>> From the analysis screen, the analysis is working as expected.
One
>> of the
>>>>>>> reasons for query="lymphoid and *a* non-lymphoid cell" not matching
>>>>>>> document containing "Lymphoid and a non-Lymphoid cell" I can
>> initially
>>>>>>> think of is: the stopword "a" is probably present in post-analysis
>> either
>>>>>>> of query or index. Did you tweak your index time analysis after
>> indexing?
>>>>>>> 
>>>>>>> Do two things:
>>>>>>> 
>>>>>>> 1. Post the analysis screen for and index=*"Immunoregulatory
>>>>>>> interactions between a Lymphoid and a non-Lymphoid cell"* and
>>>>>>> "query=*"lymphoid
>>>>>>> and a non-lymphoid cell"*. Try hosting the image and providing
the
>> link
>>>>>>> here.
>>>>>>> 2. Give the same JSON output as you have sent but this time with
>>>>>>> *"echoParams=all"*. Also, post the exact Solr query url.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, 6 Nov 2019 at 21:07, Erick Erickson <erickerickson@gmail.com
>> <mailto:erickerickson@gmail.com>> wrote:
>>>>>>> 
>>>>>>>> I don’t see the attachments, maybe I deleted old e-mails
or some
>> such. The
>>>>>>>> Apache server is fairly aggressive about stripping attachments
>> though, so
>>>>>>>> it’s also possible they didn’t make it through.
>>>>>>>> 
>>>>>>>>> On Nov 6, 2019, at 9:28 AM, Guilherme Viteri <gviteri@ebi.ac.uk
>> <mailto:gviteri@ebi.ac.uk>> wrote:
>>>>>>>>> 
>>>>>>>>> Thanks Erick.
>>>>>>>>> 
>>>>>>>>>> First, your index and analysis chains are considerably
different,
>> this
>>>>>>>> can easily be a source of problems. In particular, using
two
>> different
>>>>>>>> tokenizers is a huge red flag. I _strongly_ recommend against
this
>> unless
>>>>>>>> you’re totally sure you understand the consequences. Additionally,
>> your use
>>>>>>>> of the length filter is suspicious, especially since your
problem
>> statement
>>>>>>>> is about the addition of a single letter term and the min
length
>> allowed on
>>>>>>>> that filter is 2. That said, it’s reasonable to suppose
that the
>> ’a’ is
>>>>>>>> filtered out in both cases, but maybe you’ve found something
odd
>> about the
>>>>>>>> interactions.
>>>>>>>>> I will investigate the min length and post the results
later.
>>>>>>>>> 
>>>>>>>>>> Second, I have no idea what this will do. Are the
equal signs
>> typos?
>>>>>>>> Used by custom code?
>>>>>>>>> This the url in my application, not solr params. That's
the query
>> string.
>>>>>>>>> 
>>>>>>>>>> What does “species=“ do? That’s not Solr syntax,
so it’s likely
>> that
>>>>>>>> all the params with an equal-sign are totally ignored unless
it’s
>> just a
>>>>>>>> typo.
>>>>>>>>> This is part of the application. Species will be used
later on in
>> solr
>>>>>>>> to filter out the result. That's not solr. That my app params.
>>>>>>>>> 
>>>>>>>>>> Third, the easiest way to see what’s happening
under the covers
>> is to
>>>>>>>> add “&debug=true” to the query and look at the parsed
query. Ignore
>> all the
>>>>>>>> relevance calculations for the nonce, or specify “&debug=query”
to
>> skip
>>>>>>>> that part.
>>>>>>>>> The two json files i've sent, they are debugQuery=on
and the
>> explain tag
>>>>>>>> is present.
>>>>>>>>> I will try the searching the way you mentioned.
>>>>>>>>> 
>>>>>>>>> Thank for your inputs
>>>>>>>>> 
>>>>>>>>> Guilherme
>>>>>>>>> 
>>>>>>>>>> On 6 Nov 2019, at 14:14, Erick Erickson <erickerickson@gmail.com
>> <mailto:erickerickson@gmail.com>>
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Fwd to another server
>>>>>>>>>> 
>>>>>>>>>> First, your index and analysis chains are considerably
different,
>> this
>>>>>>>> can easily be a source of problems. In particular, using
two
>> different
>>>>>>>> tokenizers is a huge red flag. I _strongly_ recommend against
this
>> unless
>>>>>>>> you’re totally sure you understand the consequences. Additionally,
>> your use
>>>>>>>> of the length filter is suspicious, especially since your
problem
>> statement
>>>>>>>> is about the addition of a single letter term and the min
length
>> allowed on
>>>>>>>> that filter is 2. That said, it’s reasonable to suppose
that the
>> ’a’ is
>>>>>>>> filtered out in both cases, but maybe you’ve found something
odd
>> about the
>>>>>>>> interactions.
>>>>>>>>>> 
>>>>>>>>>> Second, I have no idea what this will do. Are the
equal signs
>> typos?
>>>>>>>> Used by custom code?
>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>> <
>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>>> 
>>>>>>>>>> 
>>>>>>>>>> What does “species=“ do? That’s not Solr syntax,
so it’s likely
>> that
>>>>>>>> all the params with an equal-sign are totally ignored unless
it’s
>> just a
>>>>>>>> typo.
>>>>>>>>>> 
>>>>>>>>>> Third, the easiest way to see what’s happening
under the covers
>> is to
>>>>>>>> add “&debug=true” to the query and look at the parsed
query. Ignore
>> all the
>>>>>>>> relevance calculations for the nonce, or specify “&debug=query”
to
>> skip
>>>>>>>> that part.
>>>>>>>>>> 
>>>>>>>>>> 90% + of the time, the question “why didn’t this
query do what I
>>>>>>>> expect” is answered by looking at the “&debug=query”
output and the
>>>>>>>> analysis page in the admin UI. NOTE: for the analysis page
be sure
>> to look
>>>>>>>> at _both_ the query and index output. Also, and very important
>> about the
>>>>>>>> analysis page (and this is confusing) is that this _assumes_
that
>> what you
>>>>>>>> put in the text boxes have made it through the query parser
intact
>> and is
>>>>>>>> analyzed by the field selected. Consider the search "q=field:word1
>> word2".
>>>>>>>> Now you type “word1 word2” into the analysis text box
and it looks
>> like
>>>>>>>> what you expect. That’s misleading because the query is
_parsed_ as
>>>>>>>> "field:word1 default_search_field:word2”. This is where
>> “&debug=query”
>>>>>>>> helps.
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Erick
>>>>>>>>>> 
>>>>>>>>>>> On Nov 6, 2019, at 2:36 AM, Paras Lehana <
>> paras.lehana@indiamart.com <mailto:paras.lehana@indiamart.com>>
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi Walter,
>>>>>>>>>>> 
>>>>>>>>>>> The solr.StopFilter removes all tokens that are
stopwords. Those
>> words
>>>>>>>> will
>>>>>>>>>>>> not be in the index, so they can never match
a query.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I think the OP's concern is different results
when adding a
>> stopword. I
>>>>>>>>>>> think he's using the filter factory correctly
- the query chain
>>>>>>>> includes
>>>>>>>>>>> the filter as well so it should remove "a" while
querying.
>>>>>>>>>>> 
>>>>>>>>>>> *@Guilherme*, please post results for both the
query, the
>> document in
>>>>>>>>>>> result you are concerned about and post full
result of analysis
>> screen
>>>>>>>> (for
>>>>>>>>>>> both query and index).
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, 5 Nov 2019 at 21:38, Walter Underwood
<
>> wunder@wunderwood.org <mailto:wunder@wunderwood.org>>
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> No.
>>>>>>>>>>>> 
>>>>>>>>>>>> The solr.StopFilter removes all tokens that
are stopwords.
>> Those words
>>>>>>>>>>>> will not be in the index, so they can never
match a query.
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. Remove the lines with solr.StopFilter
from every analysis
>> chain in
>>>>>>>>>>>> schema.xml.
>>>>>>>>>>>> 2. Reload the collection, restart Solr, or
whatever to read the
>> new
>>>>>>>> config.
>>>>>>>>>>>> 3. Reindex all of the documents.
>>>>>>>>>>>> 
>>>>>>>>>>>> When indexed with the new analysis chain,
the stopwords will
>> not be
>>>>>>>>>>>> removed and they will be searchable.
>>>>>>>>>>>> 
>>>>>>>>>>>> wunder
>>>>>>>>>>>> Walter Underwood
>>>>>>>>>>>> wunder@wunderwood.org <mailto:wunder@wunderwood.org>
>>>>>>>>>>>> http://observer.wunderwood.org/ <
>> http://observer.wunderwood.org/>  (my blog)
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Nov 5, 2019, at 8:56 AM, Guilherme
Viteri <
>> gviteri@ebi.ac.uk <mailto:gviteri@ebi.ac.uk>>
>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Ok. I am kind a lost now.
>>>>>>>>>>>>> If I open up the console > analysis
and perform it, that's the
>> final
>>>>>>>>>>>> result.
>>>>>>>>>>>>> <Screenshot 2019-11-05 at 14.54.16.png>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Your suggestion is: get rid of the <filter
stopword.txt> in the
>>>>>>>>>>>> schema.xml and during index phase replaceAll("in
>> stopwords.txt"," ")
>>>>>>>> then
>>>>>>>>>>>> add to solr. Is that correct ?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks David
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 5 Nov 2019, at 14:48, David Hastings
<
>>>>>>>> hastings.recursive@gmail.com <mailto:hastings.recursive@gmail.com>
>>>>>>>>>>>> <mailto:hastings.recursive@gmail.com <mailto:
>> hastings.recursive@gmail.com>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Fwd to another server
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> no,
>>>>>>>>>>>>>>       <filter class="solr.StopFilterFactory"
>> ignoreCase="true"
>>>>>>>>>>>>>> words="stopwords.txt"/>
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> is still using stopwords and should
be removed, in my opinion
>> of
>>>>>>>> course,
>>>>>>>>>>>>>> based on your use case may be different,
but i generally axe
>> any
>>>>>>>>>>>> reference
>>>>>>>>>>>>>> to them at all
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:47 AM Guilherme
Viteri <
>> gviteri@ebi.ac.uk <mailto:gviteri@ebi.ac.uk>
>>>>>>>>>>>> <mailto:gviteri@ebi.ac.uk <mailto:gviteri@ebi.ac.uk>>>
wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>> Haven't I done this here ?
>>>>>>>>>>>>>>> <fieldType name="text_field"
class="solr.TextField"
>>>>>>>>>>>>>>> positionIncrementGap="100" omitNorms="false"
>
>>>>>>>>>>>>>>>   <analyzer type="index">
>>>>>>>>>>>>>>>       <tokenizer class="solr.StandardTokenizerFactory"/>
>>>>>>>>>>>>>>>       <filter class="solr.ClassicFilterFactory"/>
>>>>>>>>>>>>>>>       <filter class="solr.LengthFilterFactory"
min="2"
>>>>>>>>>>>> max="20"/>
>>>>>>>>>>>>>>>       <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>>>>>>>>       <filter class="solr.StopFilterFactory"
>> ignoreCase="true"
>>>>>>>>>>>>>>> words="stopwords.txt"/>
>>>>>>>>>>>>>>>   </analyzer>
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 5 Nov 2019, at 14:15,
David Hastings <
>>>>>>>> hastings.recursive@gmail.com <mailto:hastings.recursive@gmail.com>
>>>>>>>>>>>> <mailto:hastings.recursive@gmail.com <mailto:
>> hastings.recursive@gmail.com>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Fwd to another server
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The first thing you should
do is remove any reference to
>> stop
>>>>>>>> words
>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> never use them, then re-index
your data and try it again.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:14
AM Guilherme Viteri <
>>>>>>>> gviteri@ebi.ac.uk <mailto:gviteri@ebi.ac.uk>
>>>>>>>>>>>> <mailto:gviteri@ebi.ac.uk <mailto:gviteri@ebi.ac.uk>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I am performing a search
to match a name (text_field),
>> however
>>>>>>>> this
>>>>>>>>>>>> term
>>>>>>>>>>>>>>>>> contains 'and' and 'a'
and it doesn't return any records.
>> If i
>>>>>>>> remove
>>>>>>>>>>>>>>> 'a'
>>>>>>>>>>>>>>>>> then it works.
>>>>>>>>>>>>>>>>> e.g
>>>>>>>>>>>>>>>>> Search Term: lymphoid
and a non-lymphoid cell
>>>>>>>>>>>>>>>>> doesn't work:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>> <
>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>>> 
>>>>>>>>>>>> <
>>>>>>>>>>>> 
>>>>>>>> 
>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>> <
>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>> <
>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Search term: lymphoid
and non-lymphoid cell
>>>>>>>>>>>>>>>>> works:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>> <
>> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>>> 
>>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>> <
>> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> interested in the first
result
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> schema.xml
>>>>>>>>>>>>>>>>> <field name="name"
>> type="text_field"
>>>>>>>>>>>>>>>>> indexed="true"  stored="true"
  omitNorms="false"
>>>>>>>> required="true"
>>>>>>>>>>>>>>>>> multiValued="false"/>
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>   <analyzer type="query">
>>>>>>>>>>>>>>>>>       <tokenizer class="solr.PatternTokenizerFactory"
>>>>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/>
>>>>>>>>>>>>>>>>>       <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>>>> pattern="^[/._:]+" replacement=""/>
>>>>>>>>>>>>>>>>>       <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>>>> pattern="[/._:]+$" replacement=""/>
>>>>>>>>>>>>>>>>>       <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>>>> pattern="[_]" replacement="
"/>
>>>>>>>>>>>>>>>>>       <filter class="solr.LengthFilterFactory"
min="2"
>>>>>>>>>>>>>>> max="20"/>
>>>>>>>>>>>>>>>>>       <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>>>>>>>>>>       <filter class="solr.StopFilterFactory"
>>>>>>>>>>>> ignoreCase="true"
>>>>>>>>>>>>>>>>> words="stopwords.txt"/>
>>>>>>>>>>>>>>>>>   </analyzer>
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> <fieldType name="text_field"
class="solr.TextField"
>>>>>>>>>>>>>>>>> positionIncrementGap="100"
omitNorms="false" >
>>>>>>>>>>>>>>>>>   <analyzer type="index">
>>>>>>>>>>>>>>>>>       <tokenizer class="solr.StandardTokenizerFactory"/>
>>>>>>>>>>>>>>>>>       <filter class="solr.ClassicFilterFactory"/>
>>>>>>>>>>>>>>>>>       <filter class="solr.LengthFilterFactory"
min="2"
>>>>>>>>>>>>>>> max="20"/>
>>>>>>>>>>>>>>>>>       <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>>>>>>>>>>       <filter class="solr.StopFilterFactory"
>>>>>>>>>>>> ignoreCase="true"
>>>>>>>>>>>>>>>>> words="stopwords.txt"/>
>>>>>>>>>>>>>>>>>   </analyzer>
>>>>>>>>>>>>>>>>>   <analyzer type="query">
>>>>>>>>>>>>>>>>>       <tokenizer class="solr.PatternTokenizerFactory"
>>>>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/>
>>>>>>>>>>>>>>>>>       <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>>>> pattern="^[/._:]+" replacement=""/>
>>>>>>>>>>>>>>>>>       <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>>>> pattern="[/._:]+$" replacement=""/>
>>>>>>>>>>>>>>>>>       <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>>>> pattern="[_]" replacement="
"/>
>>>>>>>>>>>>>>>>>       <filter class="solr.LengthFilterFactory"
min="2"
>>>>>>>>>>>>>>> max="20"/>
>>>>>>>>>>>>>>>>>       <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>>>>>>>>>>       <filter class="solr.StopFilterFactory"
>>>>>>>>>>>> ignoreCase="true"
>>>>>>>>>>>>>>>>> words="stopwords.txt"/>
>>>>>>>>>>>>>>>>>   </analyzer>
>>>>>>>>>>>>>>>>> </fieldType>
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> stopwords.txt
>>>>>>>>>>>>>>>>> #Standard english stop
words taken from Lucene's
>> StopAnalyzer
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> b
>>>>>>>>>>>>>>>>> c
>>>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Running SolR 6.6.2.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Is there anything I could
do to prevent this ?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>> Guilherme
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> --
>>>>>>>>>>> Regards,
>>>>>>>>>>> 
>>>>>>>>>>> *Paras Lehana* [65871]
>>>>>>>>>>> Development Engineer, Auto-Suggest,
>>>>>>>>>>> IndiaMART Intermesh Ltd.
>>>>>>>>>>> 
>>>>>>>>>>> 8th Floor, Tower A, Advant-Navis Business Park,
Sector 142,
>>>>>>>>>>> Noida, UP, IN - 201303
>>>>>>>>>>> 
>>>>>>>>>>> Mob.: +91-9560911996
>>>>>>>>>>> Work: 01203916600 | Extn:  *8173*
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> IMPORTANT:
>>>>>>>>>>> NEVER share your IndiaMART OTP/ Password with
anyone.
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> --
>>>>>>> Regards,
>>>>>>> 
>>>>>>> *Paras Lehana* [65871]
>>>>>>> Development Engineer, Auto-Suggest,
>>>>>>> IndiaMART Intermesh Ltd.
>>>>>>> 
>>>>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
>>>>>>> Noida, UP, IN - 201303
>>>>>>> 
>>>>>>> Mob.: +91-9560911996
>>>>>>> Work: 01203916600 | Extn:  *8173*
>>>>>>> 
>>>>>>> --
>>>>>>> IMPORTANT:
>>>>>>> NEVER share your IndiaMART OTP/ Password with anyone.
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> --
>>> Regards,
>>> 
>>> Paras Lehana [65871]
>>> Development Engineer, Auto-Suggest,
>>> IndiaMART Intermesh Ltd.
>>> 
>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
>>> Noida, UP, IN - 201303
>>> 
>>> Mob.: +91-9560911996 <tel:+91-9560911996>
>>> Work: 01203916600 | Extn:  8173
>>> 
>>> IMPORTANT:
>>> NEVER share your IndiaMART OTP/ Password with anyone.
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message