lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: Indexing parts of an HTML file differently
Date Thu, 27 Mar 2014 08:54:15 GMT
Can you get Delivery Server to generate Solr-style XML or JSON update
file? Might be easier than generating and then re-parsing HTML?

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Thu, Mar 27, 2014 at 3:28 PM, Michael Clivot <clivot@netmedia.de> wrote:
> Thanks for your answer Jack.
> @Gora:
>
>> How are you fetching the HTML content, and indexing it into Solr?
>
> We are using SolR with the OpenText Delivery Server. The Delivery Server generated HTML
representations of the published pages and writes them to the directory, which is used by
solr to get data content.
>
>> It is probably best to handle this requirement at that point. Haven't used Nutch
( http://nutch.apache.org/) recently, but you might be able to use it for this.
>
> Do you mean the web crawler way? From the first view, it fits us not very good. In this
case we need to implement ourselves the OpenText Search layer. Theoretically, we can try to
teach DeliveryServer to understand external indexes. But the crawling itself is not the preferred
solution - it is not so responsive, as the DS-way; in case of existing authorization restrictions,
it should be many crawler users for every role; etc...
>
> -----Ursprüngliche Nachricht-----
> Von: Gora Mohanty [mailto:gora@mimirtech.com]
> Gesendet: Dienstag, 25. März 2014 11:32
> An: solr-user@lucene.apache.org
> Betreff: Re: Indexing parts of an HTML file differently
>
> On 25 March 2014 15:59, Michael Clivot <clivot@netmedia.de> wrote:
>> Hello,
>>
>> I have the following issue and need help:
>>
>> One HTML file has different parts for different countries.
>> For example:
>>
>> <!-- Country: FR, BE --->
>> ....
>> Address for France and Benelux
>> ....
>> <!-- Country End -->
>> <!-- Country: CH -->
>> ....
>> Address for Switzerland
>> ....
>> <!-- Country End -->
>>
>> Depending on a parameter, I show or hide the parts on the website
>> Logically, all parts are in the index and therefore all items are found by SolR.
>> My question is: how can I have only the items for the current country in my result
list?
>
> How are you fetching the HTML content, and indexing it into Solr?
> It is probably best to handle this requirement at that point. Haven't used Nutch ( http://nutch.apache.org/
) recently, but you might be able to use it for this.
>
> Regards,
> Gora

Mime
View raw message