lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Hastings <hastings.recurs...@gmail.com>
Subject Re: When search term has two stopwords ('and' and 'a') together, it doesn't work
Date Fri, 08 Nov 2019 17:08:22 GMT
the pf and qf fields are REALLY nice for this

On Fri, Nov 8, 2019 at 12:02 PM Walter Underwood <wunder@wunderwood.org>
wrote:

> I always enable phrase searching in edismax for exactly this reason.
>
> Something like:
>
>        <str name="qf”>title^8 keywords^4 text</str>
>        <str name="pf”>title^16 keywords^8 text^2</str>
>
> To deal with concepts in queries, a classifier and/or named entity
> extractor can be helpful. If you have a list of concepts (“controlled
> vocabulary”) that includes “Lamin A”, and that shows up in a query, that
> term can be queried against the field matching that vocabulary.
>
> This is how LinkedIn separates people, companies, and places, for example.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Nov 8, 2019, at 10:48 AM, Erick Erickson <erickerickson@gmail.com>
> wrote:
> >
> > Look at the “mm” parameter, try setting it to 100%. Although that’t not
> entirely likely to do what you want either since virtually every doc will
> have “a” in it. But at least you’d get docs that have both terms.
> >
> > you may also be able to search for things like “Lamin A” _only as a
> phrase_ and have some luck. But this is a gnarly problem in general. Some
> people have been able to substitute synonyms and/or shingles to make this
> work at the expense of a larger index.
> >
> > This is a generic problem with context. “Lamin A” is really a “concept”,
> not just two words that happen to be near each other. Searching as a phrase
> is an OOB-but-naive way to try to make it more likely that the ranked
> results refer to the _concept_ of “Lamin A”. The assumption here is “if
> these two words appear next to each other, they’re more likely to be what I
> want”. I say “naive” because “Lamins: A new approach to...” would _also_ be
> found for a naive phrase search. (I have no idea whether such a title makes
> sense or not, but you figured that out already)...
> >
> > To do this well you’d have to dive in to NLP/Machine learning.
> >
> > I truly wish we could have the DWIM search algorithm (Do What I Mean)….
> >
> >> On Nov 8, 2019, at 11:29 AM, Guilherme Viteri <gviteri@ebi.ac.uk>
> wrote:
> >>
> >> HI Walter and Paras
> >>
> >> I indexed it removing all the references to StopWordFilter and I went
> from 121 results to near 20K as the search term q="Lymphoid and a
> non-Lymphoid cell" is matching entities such as "IFT A" or  "Lamin A". So I
> don't think removing it completely is the way to go from the scenario we
> have, but I appreciate the suggestion…
> >>
> >> Yes the response is using fl=*
> >> I am trying some combinations at the moment, but yet no success.
> >>
> >> defType=edismax
> >> q.alt=Lymphoid and a non-Lymphoid cell
> >> Number of results=1599
> >> Quite a considerable increase, even though reasonable meaningful
> results.
> >>
> >> I am sorry but I didn't understand what do you want me to do exactly
> with the lst (??) and qf and bf.
> >>
> >> Thanks everyone with their inputs
> >>
> >>
> >>> On 8 Nov 2019, at 06:45, Paras Lehana <paras.lehana@indiamart.com>
> wrote:
> >>>
> >>> Hi Guilherme
> >>>
> >>> By accident, I ended up querying the using the default handler
> (/select) and it worked.
> >>>
> >>> You've just found the culprit. Thanks for giving the material I
> requested. Your analysis chain is working as expected. I don't see any
> issue in either StopWordFilter or your boosts. I also use a boost of 50
> when boosting contextual suggestions (boosting "gold iphone" on a page of
> iphone) but I take Walter's suggestion and would try to optimize my
> weights. I agree that this 50 thing was not researched much about by us as
> well (we never faced performance or relevance issues).
> >>>
> >>> See the major difference in both the handlers - edismax. I'm pretty
> sure that your problem lies in the parsing of queries (you can confirm that
> from parsedquery key in debug of both JSON responses). I hope you have
> provided the response with fl=*. Replace q with q.alt in your /search
> handler query and I think you should start getting responses. That's
> because q.alt uses standard parser. If you want to keep using edisMax, I
> suggest you to test the responses removing some combination of lst (qf, bf)
> and find what's restricting the documents to come up. I'm out of office
> today - would have certainly tried analyzing the field values of the
> document in /select request and compare it with qf/bq in solrconfig.xml
> /search. Do this for me and you'd certainly find something.
> >>>
> >>> On Thu, 7 Nov 2019 at 21:00, Walter Underwood <wunder@wunderwood.org
> <mailto:wunder@wunderwood.org>> wrote:
> >>> I normally use a weight of 8 for the most important field, like title.
> Other fields might get a 4 or 2.
> >>>
> >>> I add a “pf” field with the weights doubled, so that phrase matches
> have a higher weight.
> >>>
> >>> The weight of 8 comes from experience at Infoseek and Inktomi, two
> early web search engines. With different relevance algorithms and totally
> different evaluation and tuning systems, they settled on weights of 8 and
> 7.5 for HTML titles. With the the two radically different system getting
> the same number, I decided that was a property of the documents, not of the
> search engines.
> >>>
> >>> wunder
> >>> Walter Underwood
> >>> wunder@wunderwood.org <mailto:wunder@wunderwood.org>
> >>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>
> (my blog)
> >>>
> >>>> On Nov 7, 2019, at 9:03 AM, Guilherme Viteri <gviteri@ebi.ac.uk
> <mailto:gviteri@ebi.ac.uk>> wrote:
> >>>>
> >>>> Hi Wunder,
> >>>>
> >>>> My indexer takes quite a few hours to be executed I am shortening it
> to run faster, but I also need to make sure it gives what we are expecting.
> This implementation's been there for >4y, and massively used.
> >>>>
> >>>>> In your edismax handlers, weights of 20, 50, and 100 are extremely
> high. I don’t think I’ve ever used a weight higher than 16 in a dozen years
> of configuring Solr.
> >>>> I've inherited that implementation and I am really keen to adequate
> it, what would you recommend ?
> >>>>
> >>>> Cheers
> >>>> Guilherme
> >>>>
> >>>>> On 7 Nov 2019, at 14:43, Walter Underwood <wunder@wunderwood.org
> <mailto:wunder@wunderwood.org>> wrote:
> >>>>>
> >>>>> Thanks for posting the files. Looking at schema.xml, I see that
you
> still are using StopFilterFactory. The first advice we gave you was to
> remove that.
> >>>>>
> >>>>> Remove StopFilterFactory everywhere and reindex.
> >>>>>
> >>>>> You will continue to have problems matching stopwords until you
do
> that.
> >>>>>
> >>>>> In your edismax handlers, weights of 20, 50, and 100 are extremely
> high. I don’t think I’ve ever used a weight higher than 16 in a dozen years
> of configuring Solr.
> >>>>>
> >>>>> wunder
> >>>>> Walter Underwood
> >>>>> wunder@wunderwood.org <mailto:wunder@wunderwood.org>
> >>>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>
> (my blog)
> >>>>>
> >>>>>> On Nov 7, 2019, at 6:56 AM, Guilherme Viteri <gviteri@ebi.ac.uk
> <mailto:gviteri@ebi.ac.uk>> wrote:
> >>>>>>
> >>>>>> Hi Paras, everyone
> >>>>>>
> >>>>>> Thank you again for your inputs and suggestions. I sorry to
hear
> you had trouble with the attachments I will host it somewhere and share the
> links.
> >>>>>> I don't tweak my index, I get the data from the graph database,
> create a document as they are and save to solr.
> >>>>>>
> >>>>>> So, I am sending the new analysis screen querying the way you
> suggested. Also the results with params and solr query url.
> >>>>>>
> >>>>>> During the process of querying what you asked I found something
> really weird (at least for me). By accident, I ended up querying the using
> the default handler (/select) and it worked. Then If I use the one I must
> use, then sadly doesn't work. I am posting both results and I will also
> post the handlers as well.
> >>>>>>
> >>>>>> Here is the link with all the files mentioned before
> >>>>>>
> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0<
> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0>
> <https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0
> <https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0
> >>
> >>>>>> If the link doesn't work www dot dropbox dot com slash sh slash
> fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a ? dl equals 0
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>>> On 7 Nov 2019, at 05:23, Paras Lehana <paras.lehana@indiamart.com
> <mailto:paras.lehana@indiamart.com>> wrote:
> >>>>>>>
> >>>>>>> Hi Guilherme.
> >>>>>>>
> >>>>>>> I am sending they analysis result and the json result as
requested.
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks for the effort. Luckily, I can see your attachments
(low
> quality
> >>>>>>> though).
> >>>>>>>
> >>>>>>> From the analysis screen, the analysis is working as expected.
One
> of the
> >>>>>>> reasons for query="lymphoid and *a* non-lymphoid cell" not
matching
> >>>>>>> document containing "Lymphoid and a non-Lymphoid cell" I
can
> initially
> >>>>>>> think of is: the stopword "a" is probably present in post-analysis
> either
> >>>>>>> of query or index. Did you tweak your index time analysis
after
> indexing?
> >>>>>>>
> >>>>>>> Do two things:
> >>>>>>>
> >>>>>>> 1. Post the analysis screen for and index=*"Immunoregulatory
> >>>>>>> interactions between a Lymphoid and a non-Lymphoid cell"*
and
> >>>>>>> "query=*"lymphoid
> >>>>>>> and a non-lymphoid cell"*. Try hosting the image and providing
the
> link
> >>>>>>> here.
> >>>>>>> 2. Give the same JSON output as you have sent but this time
with
> >>>>>>> *"echoParams=all"*. Also, post the exact Solr query url.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, 6 Nov 2019 at 21:07, Erick Erickson <
> erickerickson@gmail.com <mailto:erickerickson@gmail.com>> wrote:
> >>>>>>>
> >>>>>>>> I don’t see the attachments, maybe I deleted old e-mails
or some
> such. The
> >>>>>>>> Apache server is fairly aggressive about stripping attachments
> though, so
> >>>>>>>> it’s also possible they didn’t make it through.
> >>>>>>>>
> >>>>>>>>> On Nov 6, 2019, at 9:28 AM, Guilherme Viteri <gviteri@ebi.ac.uk
> <mailto:gviteri@ebi.ac.uk>> wrote:
> >>>>>>>>>
> >>>>>>>>> Thanks Erick.
> >>>>>>>>>
> >>>>>>>>>> First, your index and analysis chains are considerably
> different, this
> >>>>>>>> can easily be a source of problems. In particular, using
two
> different
> >>>>>>>> tokenizers is a huge red flag. I _strongly_ recommend
against
> this unless
> >>>>>>>> you’re totally sure you understand the consequences.
> Additionally, your use
> >>>>>>>> of the length filter is suspicious, especially since
your problem
> statement
> >>>>>>>> is about the addition of a single letter term and the
min length
> allowed on
> >>>>>>>> that filter is 2. That said, it’s reasonable to suppose
that the
> ’a’ is
> >>>>>>>> filtered out in both cases, but maybe you’ve found
something odd
> about the
> >>>>>>>> interactions.
> >>>>>>>>> I will investigate the min length and post the results
later.
> >>>>>>>>>
> >>>>>>>>>> Second, I have no idea what this will do. Are
the equal signs
> typos?
> >>>>>>>> Used by custom code?
> >>>>>>>>> This the url in my application, not solr params.
That's the
> query string.
> >>>>>>>>>
> >>>>>>>>>> What does “species=“ do? That’s not Solr
syntax, so it’s likely
> that
> >>>>>>>> all the params with an equal-sign are totally ignored
unless it’s
> just a
> >>>>>>>> typo.
> >>>>>>>>> This is part of the application. Species will be
used later on
> in solr
> >>>>>>>> to filter out the result. That's not solr. That my app
params.
> >>>>>>>>>
> >>>>>>>>>> Third, the easiest way to see what’s happening
under the covers
> is to
> >>>>>>>> add “&debug=true” to the query and look at the
parsed query.
> Ignore all the
> >>>>>>>> relevance calculations for the nonce, or specify “&debug=query”
> to skip
> >>>>>>>> that part.
> >>>>>>>>> The two json files i've sent, they are debugQuery=on
and the
> explain tag
> >>>>>>>> is present.
> >>>>>>>>> I will try the searching the way you mentioned.
> >>>>>>>>>
> >>>>>>>>> Thank for your inputs
> >>>>>>>>>
> >>>>>>>>> Guilherme
> >>>>>>>>>
> >>>>>>>>>> On 6 Nov 2019, at 14:14, Erick Erickson <
> erickerickson@gmail.com <mailto:erickerickson@gmail.com>>
> >>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Fwd to another server
> >>>>>>>>>>
> >>>>>>>>>> First, your index and analysis chains are considerably
> different, this
> >>>>>>>> can easily be a source of problems. In particular, using
two
> different
> >>>>>>>> tokenizers is a huge red flag. I _strongly_ recommend
against
> this unless
> >>>>>>>> you’re totally sure you understand the consequences.
> Additionally, your use
> >>>>>>>> of the length filter is suspicious, especially since
your problem
> statement
> >>>>>>>> is about the addition of a single letter term and the
min length
> allowed on
> >>>>>>>> that filter is 2. That said, it’s reasonable to suppose
that the
> ’a’ is
> >>>>>>>> filtered out in both cases, but maybe you’ve found
something odd
> about the
> >>>>>>>> interactions.
> >>>>>>>>>>
> >>>>>>>>>> Second, I have no idea what this will do. Are
the equal signs
> typos?
> >>>>>>>> Used by custom code?
> >>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> <
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> >
> >>>>>>>>>>
> >>>>>>>>>> What does “species=“ do? That’s not Solr
syntax, so it’s likely
> that
> >>>>>>>> all the params with an equal-sign are totally ignored
unless it’s
> just a
> >>>>>>>> typo.
> >>>>>>>>>>
> >>>>>>>>>> Third, the easiest way to see what’s happening
under the covers
> is to
> >>>>>>>> add “&debug=true” to the query and look at the
parsed query.
> Ignore all the
> >>>>>>>> relevance calculations for the nonce, or specify “&debug=query”
> to skip
> >>>>>>>> that part.
> >>>>>>>>>>
> >>>>>>>>>> 90% + of the time, the question “why didn’t
this query do what I
> >>>>>>>> expect” is answered by looking at the “&debug=query”
output and
> the
> >>>>>>>> analysis page in the admin UI. NOTE: for the analysis
page be
> sure to look
> >>>>>>>> at _both_ the query and index output. Also, and very
important
> about the
> >>>>>>>> analysis page (and this is confusing) is that this _assumes_
that
> what you
> >>>>>>>> put in the text boxes have made it through the query
parser
> intact and is
> >>>>>>>> analyzed by the field selected. Consider the search
> "q=field:word1 word2".
> >>>>>>>> Now you type “word1 word2” into the analysis text
box and it
> looks like
> >>>>>>>> what you expect. That’s misleading because the query
is _parsed_
> as
> >>>>>>>> "field:word1 default_search_field:word2”. This is
where
> “&debug=query”
> >>>>>>>> helps.
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Erick
> >>>>>>>>>>
> >>>>>>>>>>> On Nov 6, 2019, at 2:36 AM, Paras Lehana
<
> paras.lehana@indiamart.com <mailto:paras.lehana@indiamart.com>>
> >>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi Walter,
> >>>>>>>>>>>
> >>>>>>>>>>> The solr.StopFilter removes all tokens that
are stopwords.
> Those words
> >>>>>>>> will
> >>>>>>>>>>>> not be in the index, so they can never
match a query.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> I think the OP's concern is different results
when adding a
> stopword. I
> >>>>>>>>>>> think he's using the filter factory correctly
- the query chain
> >>>>>>>> includes
> >>>>>>>>>>> the filter as well so it should remove "a"
while querying.
> >>>>>>>>>>>
> >>>>>>>>>>> *@Guilherme*, please post results for both
the query, the
> document in
> >>>>>>>>>>> result you are concerned about and post
full result of
> analysis screen
> >>>>>>>> (for
> >>>>>>>>>>> both query and index).
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, 5 Nov 2019 at 21:38, Walter Underwood
<
> wunder@wunderwood.org <mailto:wunder@wunderwood.org>>
> >>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> No.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The solr.StopFilter removes all tokens
that are stopwords.
> Those words
> >>>>>>>>>>>> will not be in the index, so they can
never match a query.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1. Remove the lines with solr.StopFilter
from every analysis
> chain in
> >>>>>>>>>>>> schema.xml.
> >>>>>>>>>>>> 2. Reload the collection, restart Solr,
or whatever to read
> the new
> >>>>>>>> config.
> >>>>>>>>>>>> 3. Reindex all of the documents.
> >>>>>>>>>>>>
> >>>>>>>>>>>> When indexed with the new analysis chain,
the stopwords will
> not be
> >>>>>>>>>>>> removed and they will be searchable.
> >>>>>>>>>>>>
> >>>>>>>>>>>> wunder
> >>>>>>>>>>>> Walter Underwood
> >>>>>>>>>>>> wunder@wunderwood.org <mailto:wunder@wunderwood.org>
> >>>>>>>>>>>> http://observer.wunderwood.org/ <
> http://observer.wunderwood.org/>  (my blog)
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Nov 5, 2019, at 8:56 AM, Guilherme
Viteri <
> gviteri@ebi.ac.uk <mailto:gviteri@ebi.ac.uk>>
> >>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Ok. I am kind a lost now.
> >>>>>>>>>>>>> If I open up the console > analysis
and perform it, that's
> the final
> >>>>>>>>>>>> result.
> >>>>>>>>>>>>> <Screenshot 2019-11-05 at 14.54.16.png>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Your suggestion is: get rid of the
<filter stopword.txt> in
> the
> >>>>>>>>>>>> schema.xml and during index phase replaceAll("in
> stopwords.txt"," ")
> >>>>>>>> then
> >>>>>>>>>>>> add to solr. Is that correct ?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks David
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 5 Nov 2019, at 14:48, David
Hastings <
> >>>>>>>> hastings.recursive@gmail.com <mailto:hastings.recursive@gmail.com
> >
> >>>>>>>>>>>> <mailto:hastings.recursive@gmail.com
<mailto:
> hastings.recursive@gmail.com>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Fwd to another server
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> no,
> >>>>>>>>>>>>>>      <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> >>>>>>>>>>>>>> words="stopwords.txt"/>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> is still using stopwords and
should be removed, in my
> opinion of
> >>>>>>>> course,
> >>>>>>>>>>>>>> based on your use case may be
different, but i generally
> axe any
> >>>>>>>>>>>> reference
> >>>>>>>>>>>>>> to them at all
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:47
AM Guilherme Viteri <
> gviteri@ebi.ac.uk <mailto:gviteri@ebi.ac.uk>
> >>>>>>>>>>>> <mailto:gviteri@ebi.ac.uk <mailto:gviteri@ebi.ac.uk>>>
wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks.
> >>>>>>>>>>>>>>> Haven't I done this here
?
> >>>>>>>>>>>>>>> <fieldType name="text_field"
class="solr.TextField"
> >>>>>>>>>>>>>>> positionIncrementGap="100"
omitNorms="false" >
> >>>>>>>>>>>>>>>  <analyzer type="index">
> >>>>>>>>>>>>>>>      <tokenizer class="solr.StandardTokenizerFactory"/>
> >>>>>>>>>>>>>>>      <filter class="solr.ClassicFilterFactory"/>
> >>>>>>>>>>>>>>>      <filter class="solr.LengthFilterFactory"
min="2"
> >>>>>>>>>>>> max="20"/>
> >>>>>>>>>>>>>>>      <filter class="solr.LowerCaseFilterFactory"/>
> >>>>>>>>>>>>>>>      <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> >>>>>>>>>>>>>>> words="stopwords.txt"/>
> >>>>>>>>>>>>>>>  </analyzer>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 5 Nov 2019, at 14:15,
David Hastings <
> >>>>>>>> hastings.recursive@gmail.com <mailto:hastings.recursive@gmail.com
> >
> >>>>>>>>>>>> <mailto:hastings.recursive@gmail.com
<mailto:
> hastings.recursive@gmail.com>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Fwd to another server
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The first thing you
should do is remove any reference to
> stop
> >>>>>>>> words
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> never use them, then
re-index your data and try it again.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Tue, Nov 5, 2019
at 9:14 AM Guilherme Viteri <
> >>>>>>>> gviteri@ebi.ac.uk <mailto:gviteri@ebi.ac.uk>
> >>>>>>>>>>>> <mailto:gviteri@ebi.ac.uk <mailto:gviteri@ebi.ac.uk>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I am performing
a search to match a name (text_field),
> however
> >>>>>>>> this
> >>>>>>>>>>>> term
> >>>>>>>>>>>>>>>>> contains 'and' and
'a' and it doesn't return any
> records. If i
> >>>>>>>> remove
> >>>>>>>>>>>>>>> 'a'
> >>>>>>>>>>>>>>>>> then it works.
> >>>>>>>>>>>>>>>>> e.g
> >>>>>>>>>>>>>>>>> Search Term: lymphoid
and a non-lymphoid cell
> >>>>>>>>>>>>>>>>> doesn't work:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> <
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> >
> >>>>>>>>>>>> <
> >>>>>>>>>>>>
> >>>>>>>>
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> <
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> >
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> <
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> >
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Search term: lymphoid
and non-lymphoid cell
> >>>>>>>>>>>>>>>>> works:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> <
> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> >
> >>>>>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> <
> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> >
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> interested in the
first result
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> schema.xml
> >>>>>>>>>>>>>>>>> <field name="name"
> type="text_field"
> >>>>>>>>>>>>>>>>> indexed="true" 
stored="true"   omitNorms="false"
> >>>>>>>> required="true"
> >>>>>>>>>>>>>>>>> multiValued="false"/>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>  <analyzer type="query">
> >>>>>>>>>>>>>>>>>      <tokenizer
class="solr.PatternTokenizerFactory"
> >>>>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/>
> >>>>>>>>>>>>>>>>>      <filter
class="solr.PatternReplaceFilterFactory"
> >>>>>>>>>>>>>>>>> pattern="^[/._:]+"
replacement=""/>
> >>>>>>>>>>>>>>>>>      <filter
class="solr.PatternReplaceFilterFactory"
> >>>>>>>>>>>>>>>>> pattern="[/._:]+$"
replacement=""/>
> >>>>>>>>>>>>>>>>>      <filter
class="solr.PatternReplaceFilterFactory"
> >>>>>>>>>>>>>>>>> pattern="[_]" replacement="
"/>
> >>>>>>>>>>>>>>>>>      <filter
class="solr.LengthFilterFactory" min="2"
> >>>>>>>>>>>>>>> max="20"/>
> >>>>>>>>>>>>>>>>>      <filter
class="solr.LowerCaseFilterFactory"/>
> >>>>>>>>>>>>>>>>>      <filter
class="solr.StopFilterFactory"
> >>>>>>>>>>>> ignoreCase="true"
> >>>>>>>>>>>>>>>>> words="stopwords.txt"/>
> >>>>>>>>>>>>>>>>>  </analyzer>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> <fieldType name="text_field"
class="solr.TextField"
> >>>>>>>>>>>>>>>>> positionIncrementGap="100"
omitNorms="false" >
> >>>>>>>>>>>>>>>>>  <analyzer type="index">
> >>>>>>>>>>>>>>>>>      <tokenizer
class="solr.StandardTokenizerFactory"/>
> >>>>>>>>>>>>>>>>>      <filter
class="solr.ClassicFilterFactory"/>
> >>>>>>>>>>>>>>>>>      <filter
class="solr.LengthFilterFactory" min="2"
> >>>>>>>>>>>>>>> max="20"/>
> >>>>>>>>>>>>>>>>>      <filter
class="solr.LowerCaseFilterFactory"/>
> >>>>>>>>>>>>>>>>>      <filter
class="solr.StopFilterFactory"
> >>>>>>>>>>>> ignoreCase="true"
> >>>>>>>>>>>>>>>>> words="stopwords.txt"/>
> >>>>>>>>>>>>>>>>>  </analyzer>
> >>>>>>>>>>>>>>>>>  <analyzer type="query">
> >>>>>>>>>>>>>>>>>      <tokenizer
class="solr.PatternTokenizerFactory"
> >>>>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/>
> >>>>>>>>>>>>>>>>>      <filter
class="solr.PatternReplaceFilterFactory"
> >>>>>>>>>>>>>>>>> pattern="^[/._:]+"
replacement=""/>
> >>>>>>>>>>>>>>>>>      <filter
class="solr.PatternReplaceFilterFactory"
> >>>>>>>>>>>>>>>>> pattern="[/._:]+$"
replacement=""/>
> >>>>>>>>>>>>>>>>>      <filter
class="solr.PatternReplaceFilterFactory"
> >>>>>>>>>>>>>>>>> pattern="[_]" replacement="
"/>
> >>>>>>>>>>>>>>>>>      <filter
class="solr.LengthFilterFactory" min="2"
> >>>>>>>>>>>>>>> max="20"/>
> >>>>>>>>>>>>>>>>>      <filter
class="solr.LowerCaseFilterFactory"/>
> >>>>>>>>>>>>>>>>>      <filter
class="solr.StopFilterFactory"
> >>>>>>>>>>>> ignoreCase="true"
> >>>>>>>>>>>>>>>>> words="stopwords.txt"/>
> >>>>>>>>>>>>>>>>>  </analyzer>
> >>>>>>>>>>>>>>>>> </fieldType>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> stopwords.txt
> >>>>>>>>>>>>>>>>> #Standard english
stop words taken from Lucene's
> StopAnalyzer
> >>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>> b
> >>>>>>>>>>>>>>>>> c
> >>>>>>>>>>>>>>>>> ....
> >>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Running SolR 6.6.2.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Is there anything
I could do to prevent this ?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>>>>>> Guilherme
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> --
> >>>>>>>>>>> Regards,
> >>>>>>>>>>>
> >>>>>>>>>>> *Paras Lehana* [65871]
> >>>>>>>>>>> Development Engineer, Auto-Suggest,
> >>>>>>>>>>> IndiaMART Intermesh Ltd.
> >>>>>>>>>>>
> >>>>>>>>>>> 8th Floor, Tower A, Advant-Navis Business
Park, Sector 142,
> >>>>>>>>>>> Noida, UP, IN - 201303
> >>>>>>>>>>>
> >>>>>>>>>>> Mob.: +91-9560911996
> >>>>>>>>>>> Work: 01203916600 | Extn:  *8173*
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> IMPORTANT:
> >>>>>>>>>>> NEVER share your IndiaMART OTP/ Password
with anyone.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> --
> >>>>>>> Regards,
> >>>>>>>
> >>>>>>> *Paras Lehana* [65871]
> >>>>>>> Development Engineer, Auto-Suggest,
> >>>>>>> IndiaMART Intermesh Ltd.
> >>>>>>>
> >>>>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> >>>>>>> Noida, UP, IN - 201303
> >>>>>>>
> >>>>>>> Mob.: +91-9560911996
> >>>>>>> Work: 01203916600 | Extn:  *8173*
> >>>>>>>
> >>>>>>> --
> >>>>>>> IMPORTANT:
> >>>>>>> NEVER share your IndiaMART OTP/ Password with anyone.
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> --
> >>> Regards,
> >>>
> >>> Paras Lehana [65871]
> >>> Development Engineer, Auto-Suggest,
> >>> IndiaMART Intermesh Ltd.
> >>>
> >>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> >>> Noida, UP, IN - 201303
> >>>
> >>> Mob.: +91-9560911996 <tel:+91-9560911996>
> >>> Work: 01203916600 | Extn:  8173
> >>>
> >>> IMPORTANT:
> >>> NEVER share your IndiaMART OTP/ Password with anyone.
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message