lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: When search term has two stopwords ('and' and 'a') together, it doesn't work
Date Fri, 08 Nov 2019 17:02:25 GMT
I always enable phrase searching in edismax for exactly this reason.

Something like:

       <str name="qf”>title^8 keywords^4 text</str>
       <str name="pf”>title^16 keywords^8 text^2</str>

To deal with concepts in queries, a classifier and/or named entity extractor can be helpful.
If you have a list of concepts (“controlled vocabulary”) that includes “Lamin A”,
and that shows up in a query, that term can be queried against the field matching that vocabulary.

This is how LinkedIn separates people, companies, and places, for example.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Nov 8, 2019, at 10:48 AM, Erick Erickson <erickerickson@gmail.com> wrote:
> 
> Look at the “mm” parameter, try setting it to 100%. Although that’t not entirely
likely to do what you want either since virtually every doc will have “a” in it. But at
least you’d get docs that have both terms.
> 
> you may also be able to search for things like “Lamin A” _only as a phrase_ and have
some luck. But this is a gnarly problem in general. Some people have been able to substitute
synonyms and/or shingles to make this work at the expense of a larger index.
> 
> This is a generic problem with context. “Lamin A” is really a “concept”, not
just two words that happen to be near each other. Searching as a phrase is an OOB-but-naive
way to try to make it more likely that the ranked results refer to the _concept_ of “Lamin
A”. The assumption here is “if these two words appear next to each other, they’re more
likely to be what I want”. I say “naive” because “Lamins: A new approach to...”
would _also_ be found for a naive phrase search. (I have no idea whether such a title makes
sense or not, but you figured that out already)...
> 
> To do this well you’d have to dive in to NLP/Machine learning.
> 
> I truly wish we could have the DWIM search algorithm (Do What I Mean)….
> 
>> On Nov 8, 2019, at 11:29 AM, Guilherme Viteri <gviteri@ebi.ac.uk> wrote:
>> 
>> HI Walter and Paras
>> 
>> I indexed it removing all the references to StopWordFilter and I went from 121 results
to near 20K as the search term q="Lymphoid and a non-Lymphoid cell" is matching entities such
as "IFT A" or  "Lamin A". So I don't think removing it completely is the way to go from the
scenario we have, but I appreciate the suggestion…
>> 
>> Yes the response is using fl=*
>> I am trying some combinations at the moment, but yet no success.
>> 
>> defType=edismax
>> q.alt=Lymphoid and a non-Lymphoid cell
>> Number of results=1599
>> Quite a considerable increase, even though reasonable meaningful results. 
>> 
>> I am sorry but I didn't understand what do you want me to do exactly with the lst
(??) and qf and bf.
>> 
>> Thanks everyone with their inputs
>> 
>> 
>>> On 8 Nov 2019, at 06:45, Paras Lehana <paras.lehana@indiamart.com> wrote:
>>> 
>>> Hi Guilherme
>>> 
>>> By accident, I ended up querying the using the default handler (/select) and
it worked. 
>>> 
>>> You've just found the culprit. Thanks for giving the material I requested. Your
analysis chain is working as expected. I don't see any issue in either StopWordFilter or your
boosts. I also use a boost of 50 when boosting contextual suggestions (boosting "gold iphone"
on a page of iphone) but I take Walter's suggestion and would try to optimize my weights.
I agree that this 50 thing was not researched much about by us as well (we never faced performance
or relevance issues).  
>>> 
>>> See the major difference in both the handlers - edismax. I'm pretty sure that
your problem lies in the parsing of queries (you can confirm that from parsedquery key in
debug of both JSON responses). I hope you have provided the response with fl=*. Replace q
with q.alt in your /search handler query and I think you should start getting responses. That's
because q.alt uses standard parser. If you want to keep using edisMax, I suggest you to test
the responses removing some combination of lst (qf, bf) and find what's restricting the documents
to come up. I'm out of office today - would have certainly tried analyzing the field values
of the document in /select request and compare it with qf/bq in solrconfig.xml /search. Do
this for me and you'd certainly find something.  
>>> 
>>> On Thu, 7 Nov 2019 at 21:00, Walter Underwood <wunder@wunderwood.org <mailto:wunder@wunderwood.org>>
wrote:
>>> I normally use a weight of 8 for the most important field, like title. Other
fields might get a 4 or 2.
>>> 
>>> I add a “pf” field with the weights doubled, so that phrase matches have
a higher weight.
>>> 
>>> The weight of 8 comes from experience at Infoseek and Inktomi, two early web
search engines. With different relevance algorithms and totally different evaluation and tuning
systems, they settled on weights of 8 and 7.5 for HTML titles. With the the two radically
different system getting the same number, I decided that was a property of the documents,
not of the search engines.
>>> 
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org <mailto:wunder@wunderwood.org>
>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>  (my
blog)
>>> 
>>>> On Nov 7, 2019, at 9:03 AM, Guilherme Viteri <gviteri@ebi.ac.uk <mailto:gviteri@ebi.ac.uk>>
wrote:
>>>> 
>>>> Hi Wunder,
>>>> 
>>>> My indexer takes quite a few hours to be executed I am shortening it to run
faster, but I also need to make sure it gives what we are expecting. This implementation's
been there for >4y, and massively used.
>>>> 
>>>>> In your edismax handlers, weights of 20, 50, and 100 are extremely high.
I don’t think I’ve ever used a weight higher than 16 in a dozen years of configuring Solr.
>>>> I've inherited that implementation and I am really keen to adequate it, what
would you recommend ?
>>>> 
>>>> Cheers
>>>> Guilherme
>>>> 
>>>>> On 7 Nov 2019, at 14:43, Walter Underwood <wunder@wunderwood.org <mailto:wunder@wunderwood.org>>
wrote:
>>>>> 
>>>>> Thanks for posting the files. Looking at schema.xml, I see that you still
are using StopFilterFactory. The first advice we gave you was to remove that.
>>>>> 
>>>>> Remove StopFilterFactory everywhere and reindex.
>>>>> 
>>>>> You will continue to have problems matching stopwords until you do that.
>>>>> 
>>>>> In your edismax handlers, weights of 20, 50, and 100 are extremely high.
I don’t think I’ve ever used a weight higher than 16 in a dozen years of configuring Solr.
>>>>> 
>>>>> wunder
>>>>> Walter Underwood
>>>>> wunder@wunderwood.org <mailto:wunder@wunderwood.org>
>>>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>
 (my blog)
>>>>> 
>>>>>> On Nov 7, 2019, at 6:56 AM, Guilherme Viteri <gviteri@ebi.ac.uk
<mailto:gviteri@ebi.ac.uk>> wrote:
>>>>>> 
>>>>>> Hi Paras, everyone
>>>>>> 
>>>>>> Thank you again for your inputs and suggestions. I sorry to hear
you had trouble with the attachments I will host it somewhere and share the links. 
>>>>>> I don't tweak my index, I get the data from the graph database, create
a document as they are and save to solr.
>>>>>> 
>>>>>> So, I am sending the new analysis screen querying the way you suggested.
Also the results with params and solr query url.
>>>>>> 
>>>>>> During the process of querying what you asked I found something really
weird (at least for me). By accident, I ended up querying the using the default handler (/select)
and it worked. Then If I use the one I must use, then sadly doesn't work. I am posting both
results and I will also post the handlers as well.
>>>>>> 
>>>>>> Here is the link with all the files mentioned before
>>>>>> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0<https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0>
<https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0<https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0>>
>>>>>> If the link doesn't work www dot dropbox dot com slash sh slash fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a
? dl equals 0
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>>> On 7 Nov 2019, at 05:23, Paras Lehana <paras.lehana@indiamart.com
<mailto:paras.lehana@indiamart.com>> wrote:
>>>>>>> 
>>>>>>> Hi Guilherme.
>>>>>>> 
>>>>>>> I am sending they analysis result and the json result as requested.
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks for the effort. Luckily, I can see your attachments (low
quality
>>>>>>> though).
>>>>>>> 
>>>>>>> From the analysis screen, the analysis is working as expected.
One of the
>>>>>>> reasons for query="lymphoid and *a* non-lymphoid cell" not matching
>>>>>>> document containing "Lymphoid and a non-Lymphoid cell" I can
initially
>>>>>>> think of is: the stopword "a" is probably present in post-analysis
either
>>>>>>> of query or index. Did you tweak your index time analysis after
indexing?
>>>>>>> 
>>>>>>> Do two things:
>>>>>>> 
>>>>>>> 1. Post the analysis screen for and index=*"Immunoregulatory
>>>>>>> interactions between a Lymphoid and a non-Lymphoid cell"* and
>>>>>>> "query=*"lymphoid
>>>>>>> and a non-lymphoid cell"*. Try hosting the image and providing
the link
>>>>>>> here.
>>>>>>> 2. Give the same JSON output as you have sent but this time with
>>>>>>> *"echoParams=all"*. Also, post the exact Solr query url.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, 6 Nov 2019 at 21:07, Erick Erickson <erickerickson@gmail.com
<mailto:erickerickson@gmail.com>> wrote:
>>>>>>> 
>>>>>>>> I don’t see the attachments, maybe I deleted old e-mails
or some such. The
>>>>>>>> Apache server is fairly aggressive about stripping attachments
though, so
>>>>>>>> it’s also possible they didn’t make it through.
>>>>>>>> 
>>>>>>>>> On Nov 6, 2019, at 9:28 AM, Guilherme Viteri <gviteri@ebi.ac.uk
<mailto:gviteri@ebi.ac.uk>> wrote:
>>>>>>>>> 
>>>>>>>>> Thanks Erick.
>>>>>>>>> 
>>>>>>>>>> First, your index and analysis chains are considerably
different, this
>>>>>>>> can easily be a source of problems. In particular, using
two different
>>>>>>>> tokenizers is a huge red flag. I _strongly_ recommend against
this unless
>>>>>>>> you’re totally sure you understand the consequences. Additionally,
your use
>>>>>>>> of the length filter is suspicious, especially since your
problem statement
>>>>>>>> is about the addition of a single letter term and the min
length allowed on
>>>>>>>> that filter is 2. That said, it’s reasonable to suppose
that the ’a’ is
>>>>>>>> filtered out in both cases, but maybe you’ve found something
odd about the
>>>>>>>> interactions.
>>>>>>>>> I will investigate the min length and post the results
later.
>>>>>>>>> 
>>>>>>>>>> Second, I have no idea what this will do. Are the
equal signs typos?
>>>>>>>> Used by custom code?
>>>>>>>>> This the url in my application, not solr params. That's
the query string.
>>>>>>>>> 
>>>>>>>>>> What does “species=“ do? That’s not Solr syntax,
so it’s likely that
>>>>>>>> all the params with an equal-sign are totally ignored unless
it’s just a
>>>>>>>> typo.
>>>>>>>>> This is part of the application. Species will be used
later on in solr
>>>>>>>> to filter out the result. That's not solr. That my app params.
>>>>>>>>> 
>>>>>>>>>> Third, the easiest way to see what’s happening
under the covers is to
>>>>>>>> add “&debug=true” to the query and look at the parsed
query. Ignore all the
>>>>>>>> relevance calculations for the nonce, or specify “&debug=query”
to skip
>>>>>>>> that part.
>>>>>>>>> The two json files i've sent, they are debugQuery=on
and the explain tag
>>>>>>>> is present.
>>>>>>>>> I will try the searching the way you mentioned.
>>>>>>>>> 
>>>>>>>>> Thank for your inputs
>>>>>>>>> 
>>>>>>>>> Guilherme
>>>>>>>>> 
>>>>>>>>>> On 6 Nov 2019, at 14:14, Erick Erickson <erickerickson@gmail.com
<mailto:erickerickson@gmail.com>>
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Fwd to another server
>>>>>>>>>> 
>>>>>>>>>> First, your index and analysis chains are considerably
different, this
>>>>>>>> can easily be a source of problems. In particular, using
two different
>>>>>>>> tokenizers is a huge red flag. I _strongly_ recommend against
this unless
>>>>>>>> you’re totally sure you understand the consequences. Additionally,
your use
>>>>>>>> of the length filter is suspicious, especially since your
problem statement
>>>>>>>> is about the addition of a single letter term and the min
length allowed on
>>>>>>>> that filter is 2. That said, it’s reasonable to suppose
that the ’a’ is
>>>>>>>> filtered out in both cases, but maybe you’ve found something
odd about the
>>>>>>>> interactions.
>>>>>>>>>> 
>>>>>>>>>> Second, I have no idea what this will do. Are the
equal signs typos?
>>>>>>>> Used by custom code?
>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true<https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true>
>>>>>>>>>> 
>>>>>>>>>> What does “species=“ do? That’s not Solr syntax,
so it’s likely that
>>>>>>>> all the params with an equal-sign are totally ignored unless
it’s just a
>>>>>>>> typo.
>>>>>>>>>> 
>>>>>>>>>> Third, the easiest way to see what’s happening
under the covers is to
>>>>>>>> add “&debug=true” to the query and look at the parsed
query. Ignore all the
>>>>>>>> relevance calculations for the nonce, or specify “&debug=query”
to skip
>>>>>>>> that part.
>>>>>>>>>> 
>>>>>>>>>> 90% + of the time, the question “why didn’t this
query do what I
>>>>>>>> expect” is answered by looking at the “&debug=query”
output and the
>>>>>>>> analysis page in the admin UI. NOTE: for the analysis page
be sure to look
>>>>>>>> at _both_ the query and index output. Also, and very important
about the
>>>>>>>> analysis page (and this is confusing) is that this _assumes_
that what you
>>>>>>>> put in the text boxes have made it through the query parser
intact and is
>>>>>>>> analyzed by the field selected. Consider the search "q=field:word1
word2".
>>>>>>>> Now you type “word1 word2” into the analysis text box
and it looks like
>>>>>>>> what you expect. That’s misleading because the query is
_parsed_ as
>>>>>>>> "field:word1 default_search_field:word2”. This is where
“&debug=query”
>>>>>>>> helps.
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Erick
>>>>>>>>>> 
>>>>>>>>>>> On Nov 6, 2019, at 2:36 AM, Paras Lehana <paras.lehana@indiamart.com
<mailto:paras.lehana@indiamart.com>>
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi Walter,
>>>>>>>>>>> 
>>>>>>>>>>> The solr.StopFilter removes all tokens that are
stopwords. Those words
>>>>>>>> will
>>>>>>>>>>>> not be in the index, so they can never match
a query.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I think the OP's concern is different results
when adding a stopword. I
>>>>>>>>>>> think he's using the filter factory correctly
- the query chain
>>>>>>>> includes
>>>>>>>>>>> the filter as well so it should remove "a" while
querying.
>>>>>>>>>>> 
>>>>>>>>>>> *@Guilherme*, please post results for both the
query, the document in
>>>>>>>>>>> result you are concerned about and post full
result of analysis screen
>>>>>>>> (for
>>>>>>>>>>> both query and index).
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, 5 Nov 2019 at 21:38, Walter Underwood
<wunder@wunderwood.org <mailto:wunder@wunderwood.org>>
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> No.
>>>>>>>>>>>> 
>>>>>>>>>>>> The solr.StopFilter removes all tokens that
are stopwords. Those words
>>>>>>>>>>>> will not be in the index, so they can never
match a query.
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. Remove the lines with solr.StopFilter
from every analysis chain in
>>>>>>>>>>>> schema.xml.
>>>>>>>>>>>> 2. Reload the collection, restart Solr, or
whatever to read the new
>>>>>>>> config.
>>>>>>>>>>>> 3. Reindex all of the documents.
>>>>>>>>>>>> 
>>>>>>>>>>>> When indexed with the new analysis chain,
the stopwords will not be
>>>>>>>>>>>> removed and they will be searchable.
>>>>>>>>>>>> 
>>>>>>>>>>>> wunder
>>>>>>>>>>>> Walter Underwood
>>>>>>>>>>>> wunder@wunderwood.org <mailto:wunder@wunderwood.org>
>>>>>>>>>>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>
 (my blog)
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Nov 5, 2019, at 8:56 AM, Guilherme
Viteri <gviteri@ebi.ac.uk <mailto:gviteri@ebi.ac.uk>>
>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Ok. I am kind a lost now.
>>>>>>>>>>>>> If I open up the console > analysis
and perform it, that's the final
>>>>>>>>>>>> result.
>>>>>>>>>>>>> <Screenshot 2019-11-05 at 14.54.16.png>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Your suggestion is: get rid of the <filter
stopword.txt> in the
>>>>>>>>>>>> schema.xml and during index phase replaceAll("in
stopwords.txt"," ")
>>>>>>>> then
>>>>>>>>>>>> add to solr. Is that correct ?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks David
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 5 Nov 2019, at 14:48, David Hastings
<
>>>>>>>> hastings.recursive@gmail.com <mailto:hastings.recursive@gmail.com>
>>>>>>>>>>>> <mailto:hastings.recursive@gmail.com <mailto:hastings.recursive@gmail.com>>>
wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Fwd to another server
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> no,
>>>>>>>>>>>>>>      <filter class="solr.StopFilterFactory"
ignoreCase="true"
>>>>>>>>>>>>>> words="stopwords.txt"/>
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> is still using stopwords and should
be removed, in my opinion of
>>>>>>>> course,
>>>>>>>>>>>>>> based on your use case may be different,
but i generally axe any
>>>>>>>>>>>> reference
>>>>>>>>>>>>>> to them at all
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:47 AM Guilherme
Viteri <gviteri@ebi.ac.uk <mailto:gviteri@ebi.ac.uk>
>>>>>>>>>>>> <mailto:gviteri@ebi.ac.uk <mailto:gviteri@ebi.ac.uk>>>
wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>> Haven't I done this here ?
>>>>>>>>>>>>>>> <fieldType name="text_field"
class="solr.TextField"
>>>>>>>>>>>>>>> positionIncrementGap="100" omitNorms="false"
>
>>>>>>>>>>>>>>>  <analyzer type="index">
>>>>>>>>>>>>>>>      <tokenizer class="solr.StandardTokenizerFactory"/>
>>>>>>>>>>>>>>>      <filter class="solr.ClassicFilterFactory"/>
>>>>>>>>>>>>>>>      <filter class="solr.LengthFilterFactory"
min="2"
>>>>>>>>>>>> max="20"/>
>>>>>>>>>>>>>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>>>>>>>>      <filter class="solr.StopFilterFactory"
ignoreCase="true"
>>>>>>>>>>>>>>> words="stopwords.txt"/>
>>>>>>>>>>>>>>>  </analyzer>
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 5 Nov 2019, at 14:15,
David Hastings <
>>>>>>>> hastings.recursive@gmail.com <mailto:hastings.recursive@gmail.com>
>>>>>>>>>>>> <mailto:hastings.recursive@gmail.com <mailto:hastings.recursive@gmail.com>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Fwd to another server
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The first thing you should
do is remove any reference to stop
>>>>>>>> words
>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> never use them, then re-index
your data and try it again.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:14
AM Guilherme Viteri <
>>>>>>>> gviteri@ebi.ac.uk <mailto:gviteri@ebi.ac.uk>
>>>>>>>>>>>> <mailto:gviteri@ebi.ac.uk <mailto:gviteri@ebi.ac.uk>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I am performing a search
to match a name (text_field), however
>>>>>>>> this
>>>>>>>>>>>> term
>>>>>>>>>>>>>>>>> contains 'and' and 'a'
and it doesn't return any records. If i
>>>>>>>> remove
>>>>>>>>>>>>>>> 'a'
>>>>>>>>>>>>>>>>> then it works.
>>>>>>>>>>>>>>>>> e.g
>>>>>>>>>>>>>>>>> Search Term: lymphoid
and a non-lymphoid cell
>>>>>>>>>>>>>>>>> doesn't work:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true<https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true>
>>>>>>>>>>>> <
>>>>>>>>>>>> 
>>>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true<https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true>
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
<https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true>
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Search term: lymphoid
and non-lymphoid cell
>>>>>>>>>>>>>>>>> works:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true<https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true>
>>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
<https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true>
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> interested in the first
result
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> schema.xml
>>>>>>>>>>>>>>>>> <field name="name"
                         type="text_field"
>>>>>>>>>>>>>>>>> indexed="true"  stored="true"
  omitNorms="false"
>>>>>>>> required="true"
>>>>>>>>>>>>>>>>> multiValued="false"/>
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>  <analyzer type="query">
>>>>>>>>>>>>>>>>>      <tokenizer class="solr.PatternTokenizerFactory"
>>>>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/>
>>>>>>>>>>>>>>>>>      <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>>>> pattern="^[/._:]+" replacement=""/>
>>>>>>>>>>>>>>>>>      <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>>>> pattern="[/._:]+$" replacement=""/>
>>>>>>>>>>>>>>>>>      <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>>>> pattern="[_]" replacement="
"/>
>>>>>>>>>>>>>>>>>      <filter class="solr.LengthFilterFactory"
min="2"
>>>>>>>>>>>>>>> max="20"/>
>>>>>>>>>>>>>>>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>>>>>>>>>>      <filter class="solr.StopFilterFactory"
>>>>>>>>>>>> ignoreCase="true"
>>>>>>>>>>>>>>>>> words="stopwords.txt"/>
>>>>>>>>>>>>>>>>>  </analyzer>
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> <fieldType name="text_field"
class="solr.TextField"
>>>>>>>>>>>>>>>>> positionIncrementGap="100"
omitNorms="false" >
>>>>>>>>>>>>>>>>>  <analyzer type="index">
>>>>>>>>>>>>>>>>>      <tokenizer class="solr.StandardTokenizerFactory"/>
>>>>>>>>>>>>>>>>>      <filter class="solr.ClassicFilterFactory"/>
>>>>>>>>>>>>>>>>>      <filter class="solr.LengthFilterFactory"
min="2"
>>>>>>>>>>>>>>> max="20"/>
>>>>>>>>>>>>>>>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>>>>>>>>>>      <filter class="solr.StopFilterFactory"
>>>>>>>>>>>> ignoreCase="true"
>>>>>>>>>>>>>>>>> words="stopwords.txt"/>
>>>>>>>>>>>>>>>>>  </analyzer>
>>>>>>>>>>>>>>>>>  <analyzer type="query">
>>>>>>>>>>>>>>>>>      <tokenizer class="solr.PatternTokenizerFactory"
>>>>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/>
>>>>>>>>>>>>>>>>>      <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>>>> pattern="^[/._:]+" replacement=""/>
>>>>>>>>>>>>>>>>>      <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>>>> pattern="[/._:]+$" replacement=""/>
>>>>>>>>>>>>>>>>>      <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>>>> pattern="[_]" replacement="
"/>
>>>>>>>>>>>>>>>>>      <filter class="solr.LengthFilterFactory"
min="2"
>>>>>>>>>>>>>>> max="20"/>
>>>>>>>>>>>>>>>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>>>>>>>>>>      <filter class="solr.StopFilterFactory"
>>>>>>>>>>>> ignoreCase="true"
>>>>>>>>>>>>>>>>> words="stopwords.txt"/>
>>>>>>>>>>>>>>>>>  </analyzer>
>>>>>>>>>>>>>>>>> </fieldType>
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> stopwords.txt
>>>>>>>>>>>>>>>>> #Standard english stop
words taken from Lucene's StopAnalyzer
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> b
>>>>>>>>>>>>>>>>> c
>>>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Running SolR 6.6.2.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Is there anything I could
do to prevent this ?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>> Guilherme
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> --
>>>>>>>>>>> Regards,
>>>>>>>>>>> 
>>>>>>>>>>> *Paras Lehana* [65871]
>>>>>>>>>>> Development Engineer, Auto-Suggest,
>>>>>>>>>>> IndiaMART Intermesh Ltd.
>>>>>>>>>>> 
>>>>>>>>>>> 8th Floor, Tower A, Advant-Navis Business Park,
Sector 142,
>>>>>>>>>>> Noida, UP, IN - 201303
>>>>>>>>>>> 
>>>>>>>>>>> Mob.: +91-9560911996
>>>>>>>>>>> Work: 01203916600 | Extn:  *8173*
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> IMPORTANT:
>>>>>>>>>>> NEVER share your IndiaMART OTP/ Password with
anyone.
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> -- 
>>>>>>> Regards,
>>>>>>> 
>>>>>>> *Paras Lehana* [65871]
>>>>>>> Development Engineer, Auto-Suggest,
>>>>>>> IndiaMART Intermesh Ltd.
>>>>>>> 
>>>>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
>>>>>>> Noida, UP, IN - 201303
>>>>>>> 
>>>>>>> Mob.: +91-9560911996
>>>>>>> Work: 01203916600 | Extn:  *8173*
>>>>>>> 
>>>>>>> -- 
>>>>>>> IMPORTANT: 
>>>>>>> NEVER share your IndiaMART OTP/ Password with anyone.
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> -- 
>>> Regards,
>>> 
>>> Paras Lehana [65871]
>>> Development Engineer, Auto-Suggest,
>>> IndiaMART Intermesh Ltd.
>>> 
>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
>>> Noida, UP, IN - 201303
>>> 
>>> Mob.: +91-9560911996 <tel:+91-9560911996>
>>> Work: 01203916600 | Extn:  8173
>>> 
>>> IMPORTANT: 
>>> NEVER share your IndiaMART OTP/ Password with anyone.
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message