lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: Delete Parents without any children...
Date Fri, 03 Jul 2015 12:16:42 GMT
We try to make the default postings format a good default for most
use-cases and it's unclear to me whether trading speed of multi-term
queries for compression of the terms dictionary would be a better
trade-off for most users. I think this idea needs more iterations, for
instance on this issue I experimented with lz4 which works with blocks
of data, so in order to read a single byte, you need to decompress
everything. Robert suggested that we could use something more
fine-grained like byte pair encoding[1]. I think this is a nice idea
and it would be interesting to see how it would affect multi-term
queries compared to lz4 blocks.

[1] https://en.wikipedia.org/wiki/Byte_pair_encoding

On Fri, Jul 3, 2015 at 12:09 PM, Ravikumar Govindarajan
<ravikumar.govindarajan@gmail.com> wrote:
> An unrelated question…
>
> I came across a JIRA issue where you tried compressing Terms-Dictionary
> just before writing and achieved reduction in storage space…
>
> https://issues.apache.org/jira/browse/LUCENE-4702
>
> Was it abandoned because of Terms-Dict intensive queries like Fuzzy etc..
> din't behave well?
>
> Currently we don't have plans of providing queries like Fuzzy/Re-spell
> etc.. and thought could benefit from it
>
>
> On Thu, Jul 2, 2015 at 6:02 PM, Ravikumar Govindarajan <
> ravikumar.govindarajan@gmail.com> wrote:
>
>> Thanks Adrien…
>>
>> Works like a charm!!!
>>
>> On Wed, Jul 1, 2015 at 10:22 PM, Adrien Grand <jpountz@gmail.com> wrote:
>>
>>> Hi Ravikumar,
>>>
>>> You need to run a BooleanQuery with two clauses:
>>>  - a must clause that matches all parent documents
>>>  - a must_not clause that matches all parents that have children
>>>
>>> Building this second clause can be done easily with a
>>> ToParentBlockJoinQuery around a child query that matches all your
>>> children documents.
>>>
>>>
>>> --
>>> Adrien
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>



-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message