lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Query Problem
Date Fri, 17 Dec 2010 13:52:23 GMT
Right, I *love* problems like this... NOT....

You might get some joy out of using TrimFilterFactory along with
KeywordAnalyzer,
something like this:
<fieldType name="trimField" class="solr.TextField" <your options here> >
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.TrimFilterFactory" />
</analyzer>
</fieldType>

but it depends upon what your fields are padded with....

Best
Erick

On Fri, Dec 17, 2010 at 8:12 AM, Ezequiel Calderara <ezechico@gmail.com>wrote:

> Hi Erick, you were right.
>
> I'm looking the source of the search result (instead of the render of
> internet explorer :$) and i see this:
> "<str name="SectionName">Programas_Home
> </str>"
>
> So i think that is the problem is in the SSIS process that retrieves data
> from the DB and sends it to solr.
> The data type in the db is VARCHAR(100)... but i'm sure that somewhere is
> mapping it to CHAR(100) so it's length its always 100.
>
> Thank you very much, i will keep you informed
>
> Thanksssss
>
>
>
> On Thu, Dec 16, 2010 at 9:38 PM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > OK, it works perfectly for me on a 1.4.1 instance. I've looked over your
> > files a couple of times and see nothing obvious (but you'll never find
> > anyone better at overlooking the obvious than me!).
> >
> > Tokenizing and stemming are irrelevant in this case because your
> > type is "string", which is an untokenizedtype so you don't need to
> > go there.
> >
> > The way your query parses and analyzes backs this up, so you're
> > getting to the right schema definition.
> >
> > Which may bring us to whether what's in the index is what you *think* is
> > in there. I'm betting not. Either you changed the schema and didn't
> > re-index
> > (say changed index="false" to index="true"), you didn't commit the
> > documents
> > after indexing or other such-like, or changed the field type and didn't
> > reindex.
> >
> > So go into ..../solr/admin. Click on "schema browser", click on "fields".
> > Along
> > the left you should see "SectionName", click on that. That will show you
> > the
> > #indexed# terms, and you should see, exactly, "Programas_Home" in there,
> > just
> > like in your returned documents. Let us know if that's in fact what you
> do
> > see. It's
> > possible you're being mislead by the difference between seeing the value
> in
> > a returned
> > document (the stored value) and what's searched on (the indexed
> token(s)).
> >
> > And I'm assuming that some asterisks in your mails were really there for
> > bolding and
> > you are NOT doing wildcard searches for, for instance,
> >  *SectionName:Programas_Home*.
> >
> > But we're at a point where my 1.4.1 instance produces the results you're
> > expecting,
> > at least as I understand them so I don't think it's a problem with Solr,
> > but
> > some change
> > you've made is producing results you don't expect but are correct. Like I
> > said,
> > look at the indexed terms. If you see "Programas_Home" in the admin
> console
> > after
> > following the steps above, then I don't know what to suggest....
> >
> > Best
> > Erick
> >
> > On Thu, Dec 16, 2010 at 5:12 PM, Ezequiel Calderara <ezechico@gmail.com
> > >wrote:
> >
> > > The jars are named like *1.4.1* . So i suppose its the version 1.4.1
> > >
> > > Thanks!
> > >
> > > On Thu, Dec 16, 2010 at 6:54 PM, Erick Erickson <
> erickerickson@gmail.com
> > > >wrote:
> > >
> > > > OK, what version of Solr are you using? I can take a quick check to
> see
> > > > what behavior I get....
> > > >
> > > > Erick
> > > >
> > > > On Thu, Dec 16, 2010 at 4:44 PM, Ezequiel Calderara <
> > ezechico@gmail.com
> > > > >wrote:
> > > >
> > > > > I'll check the Tokenizer to see if that's the problem.
> > > > > The results of Analysis Page for "SectionName:Programas_Home"
> > > > >  Query Analyzer org.apache.solr.schema.FieldType$DefaultAnalyzer
{}
> > >  term
> > > > > position 1 term text Programas_Home term type word source start,end
> > > 0,14
> > > > > payload
> > > > >
> > > > > So it's not having problems with that... Also in the debug you can
> > see
> > > > that
> > > > > the parsed query is correct...
> > > > > So i don't know where to look...
> > > > >
> > > > > I know nothing about "Stemming" or tokenizing, but i will look if
> > that
> > > > has
> > > > > anything to do.
> > > > >
> > > > > If anyone can help me out, please do :D
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Dec 16, 2010 at 5:55 PM, Erick Erickson <
> > > erickerickson@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Ezequiel:
> > > > > >
> > > > > > Nice job of including relevant details, by the way. Unfortunately
> > I'm
> > > > > > puzzled too. Your SectionName is a "string" type, so it should
> > > > > > be placed in the index as-is. Be a bit cautious about looking
at
> > > > > > returned results (as I see in one of your xml files) because
the
> > > > returned
> > > > > > values are the verbatim, stored field NOT what's tokenized,
and
> the
> > > > > > tokenized data is what's searched..
> > > > > >
> > > > > > That said, you SectionName should not be tokenized at all because
> > > > > > it's a string type. Take a look at the admin page, "schema
> browser"
> > > and
> > > > > > see what values for "SectionName" look (these will be the
> tokenized
> > > > > > values". They should be exactly
> > > > > > Programas_Name, complete with underscore, case changes, etc.
Is
> > that
> > > > > > the case?
> > > > > >
> > > > > > Another place that might help is the admin/analysis page. Check
> the
> > > > debug
> > > > > > boxes and input your steps and it'll show you what the
> > > transformations
> > > > > > are applied. But a quick look leaves me completely baffled.
> > > > > >
> > > > > > Sorry I can't be more help
> > > > > > Erick
> > > > > >
> > > > > > On Thu, Dec 16, 2010 at 2:07 PM, Ezequiel Calderara <
> > > > ezechico@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > Hi all, I have the following problems.
> > > > > > > I have this set of data (View data (Pastebin) <
> > > > > > > http://pastebin.com/jKbUhjVS>
> > > > > > > )
> > > > > > > If i do a search for: *SectionName:Programas_Home* i have
no
> > > results:
> > > > > > > Returned
> > > > > > > Data (PasteBin) <http://pastebin.com/wnPdHqBm>
> > > > > > > If i do a search for: *Programas_Home* i have only 1 result:
> > Result
> > > > > > > Returned
> > > > > > > (Pastebin) <http://pastebin.com/fMZkLvYK>
> > > > > > > if i do a search for: SectionName:Programa* i have 1 result:
> > Result
> > > > > > > Returned
> > > > > > > (Pastebin) <http://pastebin.com/kLLnVp4b>
> > > > > > >
> > > > > > > This is my *schema* <http://pastebin.com/PQM8uap4>
(Pastebin)
> > and
> > > > this
> > > > > > is
> > > > > > > my
> > > > > > > *solrconfig* <http://%3c/?xml version="1.0" encoding="UTF-8"
> > > > > > ?>>(PasteBin)
> > > > > >  >
> > > > > > > I don't understand why when searching for
> > > > "SectionName:Programas_Home"
> > > > > > > isn't
> > > > > > > returning any results at all...
> > > > > > >
> > > > > > > Can someone send some light on this?
> > > > > > > --
> > > > > > > ______
> > > > > > > Ezequiel.
> > > > > > >
> > > > > > > Http://www.ironicnet.com <http://www.ironicnet.com/>
<
> > http://www.ironicnet.com/> <
> > > > http://www.ironicnet.com/>
> > > >  > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > ______
> > > > > Ezequiel.
> > > > >
> > > > > Http://www.ironicnet.com <http://www.ironicnet.com/> <
> > http://www.ironicnet.com/>
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > ______
> > > Ezequiel.
> > >
> > > Http://www.ironicnet.com <http://www.ironicnet.com/>
> > >
> >
>
>
>
> --
> ______
> Ezequiel.
>
> Http://www.ironicnet.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message