lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uwe Schindler <>
Subject Re: Manipulate stored string in Lucene
Date Wed, 09 May 2018 06:11:08 GMT
Oh it's Solr? Then it's not easy possible. Plain Lucene works like that.


Am May 9, 2018 6:09:42 AM UTC schrieb Uwe Schindler <>:
>You don't need a second field name, but you can once add the indexed
>field with stored=false and then add a second instance with same field
>name and the original stored content, but not indexed. If you want to
>have docvalues, the same can be done for docvalues. Internally, Lucene
>does it like that anyways. Adding a field to store and index at same
>time is just for convenience.
>Am May 9, 2018 5:57:40 AM UTC schrieb "Pachzelt, Adrian"
>>Dear all,
>>currently I am reading text fields that contain xml text. Hence, the
>>solr input may look like this:
>><field name=”tagged_text”>&lt;sec sec-type="Introduction"
>>With all “<” and “>” escaped.
>>I wrote a tokenizer that indexes the tag attributes (e.g.
>>sec-type=”Introduction”) on the position of the tagged word
>>(“Introduction” in this case) and hence I need the HTML tags when
>>indexing. However, I want to strip the HTML in the stored string that
>>is shown to the user on a query. So far, I figured out that the index
>>and the stored string a separated. Thus, I thought it should be
>>possible to manipulate the stored string either after indexing.
>>Is there a way to do so? I would prefer to manipulate the stored
>>and not introduce a second field with the plain text in the input
>>I am glad for any help!
>>Best Regards,
>>Adrian Pachzelt
>>- Fachinformationsdienst Biodiversitaetsforschung -
>>- Hosting von Open Access-Zeitschriften -
>>Universitaetsbibliothek Johann Christian Senckenberg
>>Bockenheimer Landstr. 134-138
>>60325 Frankfurt am Main
>>Tel. 069/798-39382
>Uwe Schindler
>Achterdiek 19, 28357 Bremen

Uwe Schindler
Achterdiek 19, 28357 Bremen
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message