lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kumaran Ramasubramanian <kums....@gmail.com>
Subject Re: Indexing and storing Long fields
Date Thu, 28 Jul 2016 13:31:03 GMT
Hi Mike,

 For your information, am using lucene 4.10.4.. am i missing anything?



​--
Kumaran R​




On Wed, Jul 27, 2016 at 1:52 AM, Kumaran Ramasubramanian <kums.134@gmail.com
> wrote:

>
> Hi Mike,
>
> 1.if we index one field as analyzed and not analyzed using same name,
> phrase queries are not working (field "comp" was indexed without position
> data, cannot run phrasequery) for analyzed terms also... because indexed
> document ( term properties are not proper, even if tokenized, not able to
> search "bank" or "swiss" or "world") looks like
>
> *while we index*
>
> Document<*stored,indexed**,tokenized**<comp:world bank*>
> stored,indexed,tokenized<name:kumaran >
> stored,indexed,tokenized<city:chennai> stored,indexed,tokenized<module:1>
> stored,indexed,tokenized<docid:1>>
> Document<*stored,indexed<comp:swiss bank*>
> stored,indexed,tokenized<name:kumaran >
> stored,indexed,tokenized<city:chennai> stored,indexed,tokenized<module:1>
> stored,indexed,tokenized<docid:2>>
>
>
> *in index*
>
> Document<*stored,indexed**,tokenized**<comp:world bank*>
> stored,indexed,tokenized<name:kumaran >
> stored,indexed,tokenized<city:chennai> stored,indexed,tokenized<module:1>
> stored,indexed,tokenized<docid:1>>
> Document<*stored,indexed,tokenized<comp:swiss bank*>
> stored,indexed,tokenized<name:kumaran >
> stored,indexed,tokenized<city:chennai> stored,indexed,tokenized<module:1>
> stored,indexed,tokenized<docid:2>>
>
> *impact:*
>
> *stored,indexed is changed to **stored,indexed**,tokenized*
>
> *Related links:*
>
> *https://github.com/elastic/elasticsearch/issues/12079
> <https://github.com/elastic/elasticsearch/issues/12079>*
>
> *https://github.com/elastic/elasticsearch/issues/4475
> <https://github.com/elastic/elasticsearch/issues/4475>*
>
> *http://stackoverflow.com/questions/19302887/elasticsearch-field-title-was-indexed-without-position-data-cannot-run-phras
> <http://stackoverflow.com/questions/19302887/elasticsearch-field-title-was-indexed-without-position-data-cannot-run-phras>*
>
>
>
> *2.similarly, for numeric field & string field using same field*
>
> Also, if we index numeric & stringfield using same field name in single
> index, we do lose position data of indexed string terms and so phrase
> queries not working ( field  "fieldname" was indexed without position
> data, cannot run phrasequery)
>
>
>
> https://mail-archives.apache.org/mod_mbox/lucene-java-user/201510.mbox/%3CCAHTScUgTYgSLP9OmoMe2ebVBHw8=Trih5B++u7V050VNRQZU8A@mail.gmail.com%3E
>
>
>
> > I would be pretty skeptical of this approach You're
>
> > mixing numeric data with textual data and I expect
>
> > the results to be unpredictable. You already said
>
> > "it is working for most of the
>
> > documents except one or two documents." I predict
>
> > you'll find more and more of these as time passes.
>
> >
>
> > Expect many more anomalies. At best you need to
>
> > index both forms as text rather than mixing numeric
>
> > and text data.
>
>
>
> Thanks in advance...
>
>
>
> --
> Kumaran R
>
>
>
>
>
> On Sun, Jul 24, 2016 at 1:54 AM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> On Sat, Jul 23, 2016 at 4:48 AM, Kumaran Ramasubramanian <
>> kums.134@gmail.com
>> > wrote:
>>
>> > Hi Mike,
>> >
>> > *Two different fields can be the same name*
>> >
>> > Is it so? You mean we can index one field as docvaluefield and also
>> stored
>> > field, Using same name?
>> >
>>
>> This should be fine, yes.
>>
>>
>> > And AFAIK, We cannot index one field as analyzed and not analyzed using
>> the
>> > same name. Am i right?
>> >
>>
>> Hmm, I think you can do this?  The first one will be tokenized, and the
>> second indexed as a single token.
>>
>> Or do you see otherwise?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message