lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: Metadata and FullText, indexed at different times - looking for best approach
Date Mon, 16 Jul 2012 17:43:49 GMT
Thank you,

I am already on 4alpha. Patch feels a little too unstable for my
needs/familiarity with the codes.

What about something around multiple cores? Could I have full-text
fields stored in a separate cores and somehow (again, minimum
hand-coding) do search against all those cores and get back combined
list of document IDs? Or would it making comparative ranking/sorting
impossible?

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Sun, Jul 15, 2012 at 12:08 PM, Erick Erickson
<erickerickson@gmail.com> wrote:
> You've got a couple of choices. There's a new patch in town
> https://issues.apache.org/jira/browse/SOLR-139
> that allows you to update individual fields in a doc if (and only if)
> all the fields in the original document were stored (actually, all the
> non-copy fields).
>
> So if you're storing (stored="true") all your metadata information, you can
> just update the document when the  text becomes available assuming you
> know the uniqueKey when you update.
>
> Under the covers, this will find the old document, get all the fields, add the
> new fields to it, and re-index the whole thing.
>
> Otherwise, your fallback idea is a good one.
>
> Best
> Erick
>
> On Sat, Jul 14, 2012 at 11:05 PM, Alexandre Rafalovitch
> <arafalov@gmail.com> wrote:
>> Hello,
>>
>> I have a database of metadata and I can inject it into SOLR with DIH
>> just fine. But then, I also have the documents to extract full text
>> from that I want to add to the same records as additional fields. I
>> think DIH allows to run Tika at the ingestion time, but I may not have
>> the full-text files at that point (they could arrive days later). I
>> can match the file to the metadata by a file name matching a field
>> name.
>>
>> What is the best approach to do that staggered indexing with minimum
>> custom code? I guess my fallback position is a custom full-text
>> indexer agent that re-adds the metadata fields when the file is being
>> indexed. Is there anything better?
>>
>> I am a newbie using v4.0alpha of SOLR (and loving it).
>>
>> Thank you,
>>     Alex.
>> Personal blog: http://blog.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all
>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)

Mime
View raw message