lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: How to configure schema.xml to take in account two database tables?
Date Sun, 05 Aug 2012 17:59:12 GMT
A quick check here is to go to your admin/stats page and look at
numDocs and maxDocs. numDocs is the number of documents that it's
possible to find, i.e. non updated/deleted docs. maxDocs is the number
of documents that have been added, and that count includes ones with
duplicate unique IDs.

So I'm guessing that numDocs == 9 and maxDocs == 654, which as Jack
says indicates that your uniqueKey is repeated for lots and lots of
your data...

Best
Erick

On Sun, Aug 5, 2012 at 1:40 PM, Jack Krupansky <jack@basetechnology.com> wrote:
> Make sure the id is not duplicated. You might have inadvertently populated
> the id field in your Solr schema with some non-key value that occurs with
> high frequency (and may have roughly 9 unique values.)
>
> Examine the 9 results and their id fields. Then look at some of your input
> data to verify that the values placed in the id field are what you expected.
>
> If possible, identify one input record that isn't in the 9 results but
> should be and verify its id.
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Andre Lopes
> Sent: Sunday, August 05, 2012 1:31 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to configure schema.xml to take in account two database
> tables?
>
>
> Thanks for the replies,
>
> I've now successfully indexed the database using the DataImportHandler
> but there is something weird. I've indexed 654 entries but I can't
> output all the 654 results.
>
> After the I run the
> "http://localhost:8983/solr/dataimport?command=full-import" I got 654
> adds:
>
> Aug 5, 2012 6:16:51 PM
> org.apache.solr.update.processor.LogUpdateProcessor finish
> INFO: {deleteByQuery=*:*,add=[http://1.com, http://2.com,
> http://3.com, http://4.com, http://5.com, http://6a.com, http://7.vu,
> http://8.com/, ... (654 adds)],commit=} 0 35
>
> But when I query the Solr with this query
> "http://localhost:8983/solr/select?q=*:*" I only get 9 results.
>
> I've used a very basic schema.xml:
>
> <?xml version="1.0" encoding="UTF-8" ?>
> <schema name="example" version="1.5">
>
>  <types>
>    <fieldType name="string" class="solr.StrField"/>
>  </types>
>
>  <fields>
>    <dynamicField name="*"       type="string" indexed="true" stored="true"
> />
>
>    <field name="id" type="string" indexed="true" stored="true"
> multiValued="false" />
>    <field name="name" type="string" indexed="true" stored="true"
> multiValued="false" />
>    <field name="address" type="string" indexed="true" stored="true"
> multiValued="false" />
>
>  </fields>
>
>    <uniqueKey>id</uniqueKey>
>   <!-- <defaultSearchField>catchall</defaultSearchField> -->
>
> </schema>
>
>
> Some clues on what I'm doing wrong?
>
> Best Regards,
>
>
>
>
>
>
> On Sun, Aug 5, 2012 at 1:19 PM, Gora Mohanty <gora@mimirtech.com> wrote:
>>
>> On 5 August 2012 17:17, Andre Lopes <lopes80andre@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I'm new to Solr. I've take some reads about how it works, but I can't
>>> find a clue for my specific situation.
>>>
>>> Here is my case. I've 2 database tables that I need to add to the
>>> index, but they are related. One entry in the table "clients" could
>>> have more than one entry in the table "contacts".
>>
>> [...]
>>
>> There seem to be various things that you need clarity on:
>> 1. Firstly, schema.xml describes the various fields that you
>>     might be indexing, and/or storing in Solr. Thus, it should
>>     contain a description for each field that you will be using,
>>     no matter what data source the field might come from.
>> 2. One typically flattens data when indexing into Solr.
>>     Following your example, as customers can have multiple
>>     phone numbers, you should denormalise your data.
>>     E.g., each Solr record could have these fields:
>>        <cust. name>, <cust. desc.>, <phone>
>>     Thus, for customer 1 you would need two records, for
>>     customer 2 one record, and for customer 3 three records.
>>
>>     You might find this blog useful, though it probably has
>>      more detail than you need:
>>      http://mysolr.com/tips/denormalized-data-structure/
>> 3. You will need some way to index the data into Solr. One
>>     way is to use the DataImportHandler which allows
>>     indexing from multiple databases:
>>     http://wiki.apache.org/solr/DataImportHandler
>>
>> Regards,
>> Gora
>
>

Mime
View raw message