lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Ramos Jardim <alexander.ramos.jar...@gmail.com>
Subject Re: How to make Relationships work for Multi-valued Index Fields?
Date Mon, 26 Jan 2009 18:54:32 GMT
Hey Gunaranjan,

I have the same scenario as you.

A lucene index is denormalized. It should not contain entity relationship.
When I need to do something like you are doing, I group the related values
in one field.

Let's say we have 2 credit cards. the first has id 30459673 and taxes at
1.5%/month and the second has id 56305 and taxes at 2.5%. What I do is
create a multivalued field that I index the values as "id ^ taxes". In the
client side I put the logic to parse the string in a convenient way to work
with the values. I expect that helps you.

2009/1/25 Gunaranjan Chandraraju <chandraraju@apple.com>

> Paul
> Its not just about merging the fields or resource usage.  If you look at
> the scenario below, the issue is that it mixes up my fields (shipping and
> billing address) for instance.  I can't merge them and still keep the
> 'distinction' for search.    Your case is a 'generalization' field.  Thus
> the search will work.   I know mine is a trivial example and can be overcome
> by just two fields (shipping_address & billing_address  - but can I am
> talking of cases when we have many such 'groups of fields').
>
> In general such one to many relationship for indices in a 'document' is
> also really really common :).  Again I am not trying to argue a point - I
> would be happy to get some idea on how to do it and be corrected if I'm
> wrong.
>
> Lastly (while thats not my worry point right now), I tend to be careful
> with resources. When dealing with very large data, I will avoid any
> unnecessary overhead as-far-as-possible and take every optimization I get :)
>
> Guna
>
>
> On Jan 25, 2009, at 1:50 AM, Paul Libbrecht wrote:
>
>  Guna,
>>
>> it's really really normal to duplicate stuffs to be merged into a field.
>>
>> We do this all the time, for example to have a field
>> "text-in-any-language" while a field "text-in-english" is also there and the
>> queries boost matches in text-in-any-language less than text-in-english (if
>> user is in english).
>>
>> This difference in weighting is the gold of Lucene I feel (of retrieval
>> generally).
>> Also, depending on the field you make different indexing, while still
>> copying it in solr (for example use a different analyzer per language).
>>
>> paul
>>
>> PS: don't be scared with resources, this is the side of the world where
>> the resource is the least the problem! (typically a "catch-all-field"
>> wouldn't be stored though as this would then load the memory).
>>
>>
>> Le 25-janv.-09 à 09:35, Gunaranjan Chandraraju a écrit :
>>
>>  Thanks
>>> This sounds redundant to me - to store the fields separately and then
>>> concat all of them to one copy field again.
>>>
>>> My XML is like this
>>> <address street="XYZ" state="CA" country="1" type="shipping" ...>
>>>
>>> I am currently using XPATH or XSL to separate them into individual
>>> indexed fields like: address_state_1, address_type_1 etc. in SOLR.
>>>
>>> From what you say, it looks to me that I might as well just treat the
>>> entire address as a single 'text field' and search within the text after
>>> tokenizing.  This way I don't need to have the _1, _2 as the single text
>>> field will contain the information together (and thus grouped - so I know
>>> which is shipping/billing etc?).    Will there be any performance difference
>>> between this and the copy field approach?
>>>
>>> Is there no other way (programmatic) to search across multiple fields?  I
>>> did take a quick look at dismax but again it needs the field names to be
>>> specifically mentioned in the config file or in the query.  I can't do this
>>> as I am not able to predict the number of fields (e.g. credit cards a person
>>> can have?).
>>>
>>> I like SOLR, but to me, this seems to be a very common and simple search
>>> scenario/pattern - however its implementation in SOLR is appearing to be not
>>> very straightforward.   (My apologies, if I on the wrong track here because
>>> I don't understand SOLR well.  )
>>>
>>> Regards,
>>> Guna
>>> On Jan 24, 2009, at 10:54 PM, Noble Paul നോബിള്‍ नोब्ळ्
wrote:
>>>
>>>  for searching you need to put them in a single field . use <copyField>
>>>> in schema.xml to achieve that
>>>>
>>>> On Sun, Jan 25, 2009 at 7:39 AM, Gunaranjan Chandraraju
>>>> <chandraraju@apple.com> wrote:
>>>>
>>>>> I make this approach work with XPATH and XSL.   However, this approach
>>>>> creates multiple fields of like this
>>>>>
>>>>> address_state_1
>>>>> address_state_2
>>>>> ...
>>>>> address_state_10
>>>>>
>>>>> and
>>>>>
>>>>> credit_card_1
>>>>> credit_card_2
>>>>> credit_card_3
>>>>>
>>>>>
>>>>> How do I search for a credit_card.    The query syntax does not seem
to
>>>>> support wild cards in field names.   For e.g. I cant seem to do this
->
>>>>> credit_card*:1234 4567 7890 1234
>>>>>
>>>>> On the search side I would not know how many credit card fields  got
>>>>> created
>>>>> for a document and so I need that to be dynamic.
>>>>>
>>>>> -g
>>>>>
>>>>>
>>>>> On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:
>>>>>
>>>>>  Oops, one more gotcha. The dynamic field support is only in 1.4 trunk.
>>>>>>
>>>>>> On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
>>>>>> shalinmangar@gmail.com> wrote:
>>>>>>
>>>>>>  On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju <
>>>>>>> chandraraju@apple.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>> <record>
>>>>>>>> <coreInfo id="123" , .../>
>>>>>>>> <address street="XYZ1" State="CA" ...type="home" />
>>>>>>>> <address street="XYZ2" state="CA" ... type="Office"/>
>>>>>>>> <address street="XYZ3" state="CA" ....type="Other"/>
>>>>>>>> </record>
>>>>>>>>
>>>>>>>> I have setup my DIH to treat these as entities as below
>>>>>>>>
>>>>>>>> <dataConfig>
>>>>>>>> <dataSource type="FileDataSource" encoding="UTF-8" />
>>>>>>>> <document>
>>>>>>>> <entity name ="f" processor="FileListEntityProcessor"
>>>>>>>>       baseDir="***"
>>>>>>>>       fileName=".*xml"
>>>>>>>>       rootEntity="false"
>>>>>>>>       dataSource="null" >
>>>>>>>>  <entity
>>>>>>>>     name="record"
>>>>>>>>     processor="XPathEntityProcessor"
>>>>>>>>     stream="false"
>>>>>>>>     forEach="/record"
>>>>>>>>     url="${f.fileAbsolutePath}">
>>>>>>>>          <field column="ID" xpath="/record/@id" />
>>>>>>>>
>>>>>>>>          <!-- Address  -->
>>>>>>>>           <entity
>>>>>>>>               name="record_adr"
>>>>>>>>               processor="XPathEntityProcessor"
>>>>>>>>               stream="false"
>>>>>>>>               forEach="/record/address"
>>>>>>>>               url="${f.fileAbsolutePath}">
>>>>>>>>                   <field column="address_street"
>>>>>>>> xpath="/record/address/@street" />
>>>>>>>>                   <field column="address_state"
>>>>>>>> xpath="/record/address//@state" />
>>>>>>>>                   <field column="address_type"
>>>>>>>> xpath="/record/address//@type" />
>>>>>>>>          </entity>
>>>>>>>>     </entity>
>>>>>>>> </entity>
>>>>>>>> </document>
>>>>>>>> </dataConfig>
>>>>>>>>
>>>>>>>>
>>>>>>> I think the only way is to create a dynamic field for each attribute
>>>>>>> (street, state etc.). Write a transformer to copy the fields
from
>>>>>>> your
>>>>>>> data
>>>>>>> config to appropriately named dynamic field (e.g. street_1, state_1,
>>>>>>> etc).
>>>>>>> To maintain this counter you will need to get/store it with
>>>>>>> Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and
>>>>>>> Context#setSessionAttribute(name, val, Context.SCOPE_DOC).
>>>>>>>
>>>>>>> I cant't think of an easier way.
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Shalin Shekhar Mangar.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Shalin Shekhar Mangar.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> --Noble Paul
>>>>
>>>
>>>
>>
>


-- 
Alexander Ramos Jardim

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message