lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kumaran Ramasubramanian <kums....@gmail.com>
Subject Re: Indexing and storing Long fields
Date Tue, 26 Jul 2016 20:22:32 GMT
Hi Mike,

1.if we index one field as analyzed and not analyzed using same name,
phrase queries are not working (field "comp" was indexed without position
data, cannot run phrasequery) for analyzed terms also... because indexed
document ( term properties are not proper, even if tokenized, not able to
search "bank" or "swiss" or "world") looks like

*while we index*

Document<*stored,indexed**,tokenized**<comp:world bank*>
stored,indexed,tokenized<name:kumaran >
stored,indexed,tokenized<city:chennai> stored,indexed,tokenized<module:1>
stored,indexed,tokenized<docid:1>>
Document<*stored,indexed<comp:swiss bank*>
stored,indexed,tokenized<name:kumaran >
stored,indexed,tokenized<city:chennai> stored,indexed,tokenized<module:1>
stored,indexed,tokenized<docid:2>>


*in index*

Document<*stored,indexed**,tokenized**<comp:world bank*>
stored,indexed,tokenized<name:kumaran >
stored,indexed,tokenized<city:chennai> stored,indexed,tokenized<module:1>
stored,indexed,tokenized<docid:1>>
Document<*stored,indexed,tokenized<comp:swiss bank*>
stored,indexed,tokenized<name:kumaran >
stored,indexed,tokenized<city:chennai> stored,indexed,tokenized<module:1>
stored,indexed,tokenized<docid:2>>

*impact:*

*stored,indexed is changed to **stored,indexed**,tokenized*

*Related links:*

*https://github.com/elastic/elasticsearch/issues/12079
<https://github.com/elastic/elasticsearch/issues/12079>*

*https://github.com/elastic/elasticsearch/issues/4475
<https://github.com/elastic/elasticsearch/issues/4475>*

*http://stackoverflow.com/questions/19302887/elasticsearch-field-title-was-indexed-without-position-data-cannot-run-phras
<http://stackoverflow.com/questions/19302887/elasticsearch-field-title-was-indexed-without-position-data-cannot-run-phras>*



*2.similarly, for numeric field & string field using same field*

Also, if we index numeric & stringfield using same field name in single
index, we do lose position data of indexed string terms and so phrase
queries not working ( field  "fieldname" was indexed without position data,
cannot run phrasequery)


https://mail-archives.apache.org/mod_mbox/lucene-java-user/201510.mbox/%3CCAHTScUgTYgSLP9OmoMe2ebVBHw8=Trih5B++u7V050VNRQZU8A@mail.gmail.com%3E



> I would be pretty skeptical of this approach You're

> mixing numeric data with textual data and I expect

> the results to be unpredictable. You already said

> "it is working for most of the

> documents except one or two documents." I predict

> you'll find more and more of these as time passes.

>

> Expect many more anomalies. At best you need to

> index both forms as text rather than mixing numeric

> and text data.



Thanks in advance...



--
Kumaran R





On Sun, Jul 24, 2016 at 1:54 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Sat, Jul 23, 2016 at 4:48 AM, Kumaran Ramasubramanian <
> kums.134@gmail.com
> > wrote:
>
> > Hi Mike,
> >
> > *Two different fields can be the same name*
> >
> > Is it so? You mean we can index one field as docvaluefield and also
> stored
> > field, Using same name?
> >
>
> This should be fine, yes.
>
>
> > And AFAIK, We cannot index one field as analyzed and not analyzed using
> the
> > same name. Am i right?
> >
>
> Hmm, I think you can do this?  The first one will be tokenized, and the
> second indexed as a single token.
>
> Or do you see otherwise?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message