lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uwe Schindler <...@thetaphi.de>
Subject Re: Manipulate stored string in Lucene
Date Wed, 09 May 2018 06:11:08 GMT
Oh it's Solr? Then it's not easy possible. Plain Lucene works like that.

Uwe

Am May 9, 2018 6:09:42 AM UTC schrieb Uwe Schindler <uwe@thetaphi.de>:
>Hi,
>
>You don't need a second field name, but you can once add the indexed
>field with stored=false and then add a second instance with same field
>name and the original stored content, but not indexed. If you want to
>have docvalues, the same can be done for docvalues. Internally, Lucene
>does it like that anyways. Adding a field to store and index at same
>time is just for convenience.
>
>Uwe
>
>Am May 9, 2018 5:57:40 AM UTC schrieb "Pachzelt, Adrian"
><A.Pachzelt@ub.uni-frankfurt.de>:
>>Dear all,
>>
>>currently I am reading text fields that contain xml text. Hence, the
>>solr input may look like this:
>>
>><field name=”tagged_text”>&lt;sec sec-type="Introduction"
>>id="SECID0E4F"&gt;
>>&lt;title&gt;Introduction&lt;/title&gt;
>>&lt;/sec&gt;
>></field>
>>
>>With all “<” and “>” escaped.
>>I wrote a tokenizer that indexes the tag attributes (e.g.
>>sec-type=”Introduction”) on the position of the tagged word
>>(“Introduction” in this case) and hence I need the HTML tags when
>>indexing. However, I want to strip the HTML in the stored string that
>>is shown to the user on a query. So far, I figured out that the index
>>and the stored string a separated. Thus, I thought it should be
>>possible to manipulate the stored string either after indexing.
>>
>>Is there a way to do so? I would prefer to manipulate the stored
>string
>>and not introduce a second field with the plain text in the input
>file.
>>
>>I am glad for any help!
>>
>>Best Regards,
>>
>>Adrian
>>
>>-------------------------------------------------------
>>Adrian Pachzelt
>>- Fachinformationsdienst Biodiversitaetsforschung -
>>- Hosting von Open Access-Zeitschriften -
>>Universitaetsbibliothek Johann Christian Senckenberg
>>Bockenheimer Landstr. 134-138
>>60325 Frankfurt am Main
>>Tel. 069/798-39382
>>a.pachzelt@ub.uni-frankfurt.de<mailto:a.pachzelt@ub.uni-frankfurt.de>
>>-------------------------------------------------------
>
>--
>Uwe Schindler
>Achterdiek 19, 28357 Bremen
>https://www.thetaphi.de

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de
Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message