lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Docvalue v.s. invert index
Date Sun, 12 Aug 2018 16:58:46 GMT
bq. I have been informed that the performance of such a search is
absolutely terrible.

Yep. Horrible.

These two structures answer completely different questions
indexed - "for this word, what docs contain it in field X?"
DocValues - "for this document, what is the value of field X?"

On my, my usual examples are going out of date. "phone book" and
"dictionary". There used to be, in the old days, these book-like
things that were printed on actual paper and you could use them to
find people's phone number and address, or what the meaning of a word
was. Siiiiggghhhh.

Well, get a paper phone book from somewhere off the shelf and consider
each entry a "document", and the phone number and address the "text"

DocValues answers "for person X, what is the phone number" easily, the
whole thing is alphabetically arranged. But to answer the question
"Who lives on Maple street" you have to read _everything_ in the
entire phone book. Think "table scan".

To answer the question "Who lives on Maple street", you want to index
all the text.

The whole point of docValues was that the structure that was used to
answer the first question was built in the heap at runtime, consuming
memory and CPU cycles. DocValues serialized that structure to disk at
index time where it is
1> easily read as memory pages
2> almost entirely kept in MMapDirectory space, see:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Best,
Erick


On Sun, Aug 12, 2018 at 8:56 AM, Shawn Heisey <apache@elyograg.org> wrote:
> On 8/12/2018 4:39 AM, Zahra Aminolroaya wrote:
>>
>> Could we say that docvalue technique is better for sorting and faceting
>> and
>> inverted index one is better for searching?
>
>
> Yes.  That is how things work.
>
> If docValues do not exist, then an equivalent data structure must be built
> in heap memory *from* the inverted index in order for faceting or sorting to
> take place.  When docValues are present, Solr can just read the data
> directly instead of generating it.  If there is plenty of spare memory for
> the OS to cache data, this is faster.  It also uses less Java heap memory.
>
>> Will I lose anything if I only use docvalue?
>>
>> Does docvalue technique have better performance?
>
>
> From what I understand, it actually is possible to search when docValues are
> present but the inverted index isn't, assuming that what you're searching
> for is the full value of the field, not an individual word.  I have been
> informed that the performance of such a search is absolutely terrible.
>
> Thanks,
> Shawn
>

Mime
View raw message