lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Multivalue wild card search
Date Tue, 24 Jun 2014 01:53:57 GMT
Nope, got to re-index.

bq:  Assuming there is a multiValued field called "Name" of type string stored
in index -

bq: I tested both cases with empty index.  When I inserted the document after
changing fieldType to StandardTokenizerFactory, it worked fine with the
standard phrase query.  But I was not able to look up already indexed
documents.

So, you have documents in your index with this field as a string type. You
then changed the schema definition for that field to be a tokenized type using
StandardAnalyzer. You simply cannot form a search that does what you want
on both the old and new documents.

Consider
doc1 has "steven G. Wonder" _as a single token_ because that's the way
it was indexed.
doc2 has "steven" "g" "wonder", three separate tokens.

Even if you could get string types to work with wildcards and create a
regex, in the first case the regex is applied to "steven G. Wonder",
say "steve*er" matched.

In the second case, you'd try to match "steve*er" against "steven". no match.
Ditto with "steve*er" against "g"
Ditto with "steve*er" against "wonder".

Best,
Erick



On Mon, Jun 23, 2014 at 3:56 PM, Ethan <eh198101@gmail.com> wrote:
> Hi Ahmet,
>
> I have tested this and it doesn't work for existing documents.  I couldn't
> make much sense of the field analysis. I didn't find an option to see
> indexed terms in "Analysis" tab. Instead you feed it the value you want
> analyzed and it prints index or query time analysis.  Is this what you're
> referring to?
>
> I tested both cases with empty index.  When I inserted the document after
> changing fieldType to StandardTokenizerFactory, it worked fine with the
> standard phrase query.  But I was not able to look up already indexed
> documents.  I believe there is an extra step or information that I think is
> missing.
>
> Thanks.
> E
>
>
>
>
> On Mon, Jun 23, 2014 at 3:21 PM, Ahmet Arslan <iorixxx@yahoo.com.invalid>
> wrote:
>
>> Hi Ethan,
>>
>> With that type standard phrase query should work. If you paste you sample
>> text in analysis page, you will see indexed terms.
>>
>> q=Name:"steve wonder" should work. You don't need wildcard search in this
>> case. Just do a phrase query. (surrounded with quotes)
>>
>> Ahmet
>>
>>
>> On Tuesday, June 24, 2014 1:07 AM, Ethan <eh198101@gmail.com> wrote:
>> Ahmet,
>>
>> Here the xml for the field "Name" - Let me know if I need to update it.
>>
>> <field name="Name" type="token2" indexed="true" stored="true"
>> multiValued="true" omitTermFreqAndPositions="false"/>
>>
>> <types>
>>   <fieldType name="token2" class="solr.TextField" omitNorms="true"
>> positionIncrementGap="1">
>>         <analyzer>
>>             <tokenizer class="solr.StandardTokenizerFactory"/>
>>             <filter class="solr.LowerCaseFilterFactory"/>
>>         </analyzer>
>>     </fieldType>
>> </types>
>>
>> Thanks,
>> E
>>
>>
>>
>>
>>
>> On Mon, Jun 23, 2014 at 12:38 PM, Ahmet Arslan <iorixxx@yahoo.com.invalid>
>> wrote:
>>
>> > Hi Ethan,
>> >
>> > I understand that you are dealing legacy system.
>> >
>> >
>> > Can you paste analysis chain used for already indexed docs. I mean xml
>> > snippet taken from schema xml.
>> > With this, we will figure out how that text is indexed. We will write our
>> > query according to that info.
>> >
>> >
>> > Ahmet
>> >
>> >
>> >
>> >
>> >
>> > On Monday, June 23, 2014 10:09 PM, Ethan <eh198101@gmail.com> wrote:
>> > Hey Ahmet,  Yes, brackets, commas and quotes are part of fields value.
>> > It's something I inherited and working on improving it.
>> >
>> > The field is of type solr.TextField. Adding StandardTokenizer solves the
>> > problem for the new documents.  It doesn't work on already indexed docs.
>> > Is there a solution for that other than re-indexing?
>> >
>> > Thanks,
>> > E
>> >
>> >
>> >
>> > On Mon, Jun 23, 2014 at 11:05 AM, Ahmet Arslan <iorixxx@yahoo.com.invalid
>> >
>> > wrote:
>> >
>> > >
>> > > Hi Ethan,
>> > >
>> > > XML response is helpful, so you still have brackets, commas, quotes in
>> > > field value?
>> > >
>> > > What is the field type you use for Name field?
>> > >
>> > > If you tokenize it StandardTokenizer simple phrase query would do the
>> > trick
>> > > q=Name:"Steve Wonder"
>> > >
>> > > Also consider cleaning up your values. Why would you store all that
>> > > brackets etc?
>> > >
>> > >
>> > > Ahmet
>> > >
>> > >
>> > > On Monday, June 23, 2014 7:45 PM, Ethan <eh198101@gmail.com> wrote:
>> > >
>> > >
>> > >
>> > > Ahmet,
>> > > Yes, they were part of JSON output, Here is the xml response
>> > >
>> > > <arr name="Name"><str>[["Hifte", "Grop", "",
>> > > ""]]</str><str>[]</str><str>[["Ethan", "G", "",
""],["Steve", "Wonder",
>> > "",
>> > > ""]]</str></arr>
>> > >
>> > >
>> > > I solution suggested by Jack to look up Steve Wonder doesn't work as
>> > > asterick is replaced by the defaultsearch field. Any suggestions?
>> > >
>> > > Thanks,
>> > > E
>> > >
>> > >
>> > >
>> > > On Fri, Jun 20, 2014 at 12:40 AM, Ahmet Arslan
>> <iorixxx@yahoo.com.invalid
>> > >
>> > > wrote:
>> > >
>> > > Hi,
>> > > >
>> > > >What are these square brackets, back slashes, quotes?
>> > > >Are they part of JSON output? Can you paste human reman able XML
>> > response
>> > > writer output?
>> > > >
>> > > >Thanks,
>> > > >Ahmet
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >On Friday, June 20, 2014 12:17 AM, Ethan <eh198101@gmail.com>
wrote:
>> > > >Ahmet,
>> > > >
>> > > >Assuming there is a multiValued field called "Name" of type string
>> > stored
>> > > >in index -
>> > > >
>> > > >//Doc 1
>> > > >"id" : 23512
>> > > >"HotelId" : [
>> > > >    "12",
>> > > >    "23",
>> > > >    "12"
>> > > >]
>> > > >"Name" : [
>> > > >"[[\"Ethan\", \"G\", \"\"],[\"Steve\", \"Wonder\", \"\"]]",
>> > > >"[]",
>> > > >"[[\"hifte\", \"Grop\", \"\"]]"
>> > > >]
>> > > >
>> > > >// Doc 2
>> > > >
>> > > >"id" : 23513
>> > > >"HotelId" : [
>> > > >    "12",
>> > > >    "12"
>> > > >]
>> > > >"Name" : [
>> > > >"[[\"Ethan\", \"G\", \"\"],[\"Steve\", \"\", \"\"]]",
>> > > >"[]",
>> > > >]
>> > > >
>> > > >Here, how do I find the document with Name that contains "Steve
>> Wonder"?
>> > > >
>> > > >I tried q="***[\"Steve\", \"Wonder\", \"\"]]" but that doesn't work.
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >On Fri, Jun 6, 2014 at 11:10 AM, Ahmet Arslan
>> <iorixxx@yahoo.com.invalid
>> > >
>> > > >wrote:
>> > > >
>> > > >> Hi Ethan,
>> > > >>
>> > > >>
>> > > >> It is hard to understand your example. Can you re-write it? Using
>> xml?
>> > > >>
>> > > >>
>> > > >>
>> > > >> On Friday, June 6, 2014 9:07 PM, Ethan <eh198101@gmail.com>
wrote:
>> > > >> Bumping the thread to see if anyone has a solution.
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >> On Thu, Jun 5, 2014 at 9:52 AM, Ethan <eh198101@gmail.com>
wrote:
>> > > >>
>> > > >> > Wildcard search do work on multiValued field.  I was able
to pull
>> up
>> > > >> > records for following multiValued field -
>> > > >> >
>> > > >> > Code : [
>> > > >> > "12344",
>> > > >> > "4534",
>> > > >> > "674"
>> > > >> > ]
>> > > >> >
>> > > >> > q=Code:45* fetched the correct document.  It doesn't work
in
>> > > >> > quotes(q="Code:45*"), however.  Is there a workaround?
>> > > >> >
>> > > >> >
>> > > >> > On Thu, Jun 5, 2014 at 9:34 AM, Ethan <eh198101@gmail.com>
wrote:
>> > > >> >
>> > > >> >> Are you implying there is not way to lookup on a multiValued
>> field
>> > > with
>> > > >> a
>> > > >> >> substring?  If so, then how is it usually handled?
>> > > >> >>
>> > > >> >>
>> > > >> >> On Wed, Jun 4, 2014 at 4:44 PM, Jack Krupansky <
>> > > jack@basetechnology.com
>> > > >> >
>> > > >> >> wrote:
>> > > >> >>
>> > > >> >>> Wildcard, fuzzy, and regex query operate on a single
term of a
>> > > single
>> > > >> >>> tokenized field value or a single string field value.
>> > > >> >>>
>> > > >> >>> -- Jack Krupansky
>> > > >> >>>
>> > > >> >>> -----Original Message----- From: Ethan
>> > > >> >>> Sent: Wednesday, June 4, 2014 6:59 PM
>> > > >> >>> To: solr-user
>> > > >> >>> Subject: Multivalue wild card search
>> > > >> >>>
>> > > >> >>>
>> > > >> >>> I can't seem to find a solution to do wild card search
on a
>> > > multiValued
>> > > >> >>> field.
>> > > >> >>>
>> > > >> >>> For Eg consider a multiValued field called "Name"
with 3 values
>> -
>> > > >> >>>
>> > > >> >>> "Name" : [
>> > > >> >>> "[[\"Ethan\", \"G\", \"\"],[\"Steve\", \"Wonder\",
\"\"]]",
>> > > >> >>> "[]",
>> > > >> >>> "[[\"hifte\", \"Grop\", \"\"]]"
>> > > >> >>> ]
>> > > >> >>>
>> > > >> >>> For a multiValued like above, I want search like-
>> > > >> >>>
>> > > >> >>> q="***[\"Steve\", \"Wonder\", \"\"]"
>> > > >> >>>
>> > > >> >>>
>> > > >> >>> But I do not get back any results back. Any ideas
on to create
>> > such
>> > > >> >>> query?
>> > > >> >>>
>> > > >> >>
>> > > >> >>
>> > > >> >
>> > > >>
>> > > >>
>> > > >
>> > > >
>> > >
>> >
>> >
>>

Mime
View raw message