lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject Re: Delete Parents without any children...
Date Fri, 10 Jul 2015 11:43:01 GMT
Yes, as you suggested simply wrapping up postings with LZ4 could not be
best-fit for all cases. Byte-Pair Encoding looks very promising

I accidentally stumbled upon this JIRA and found it was abandoned mid-way.

Thanks for sharing the details

--
Ravi

On Fri, Jul 3, 2015 at 5:46 PM, Adrien Grand <jpountz@gmail.com> wrote:

> We try to make the default postings format a good default for most
> use-cases and it's unclear to me whether trading speed of multi-term
> queries for compression of the terms dictionary would be a better
> trade-off for most users. I think this idea needs more iterations, for
> instance on this issue I experimented with lz4 which works with blocks
> of data, so in order to read a single byte, you need to decompress
> everything. Robert suggested that we could use something more
> fine-grained like byte pair encoding[1]. I think this is a nice idea
> and it would be interesting to see how it would affect multi-term
> queries compared to lz4 blocks.
>
> [1] https://en.wikipedia.org/wiki/Byte_pair_encoding
>
> On Fri, Jul 3, 2015 at 12:09 PM, Ravikumar Govindarajan
> <ravikumar.govindarajan@gmail.com> wrote:
> > An unrelated question…
> >
> > I came across a JIRA issue where you tried compressing Terms-Dictionary
> > just before writing and achieved reduction in storage space…
> >
> > https://issues.apache.org/jira/browse/LUCENE-4702
> >
> > Was it abandoned because of Terms-Dict intensive queries like Fuzzy etc..
> > din't behave well?
> >
> > Currently we don't have plans of providing queries like Fuzzy/Re-spell
> > etc.. and thought could benefit from it
> >
> >
> > On Thu, Jul 2, 2015 at 6:02 PM, Ravikumar Govindarajan <
> > ravikumar.govindarajan@gmail.com> wrote:
> >
> >> Thanks Adrien…
> >>
> >> Works like a charm!!!
> >>
> >> On Wed, Jul 1, 2015 at 10:22 PM, Adrien Grand <jpountz@gmail.com>
> wrote:
> >>
> >>> Hi Ravikumar,
> >>>
> >>> You need to run a BooleanQuery with two clauses:
> >>>  - a must clause that matches all parent documents
> >>>  - a must_not clause that matches all parents that have children
> >>>
> >>> Building this second clause can be done easily with a
> >>> ToParentBlockJoinQuery around a child query that matches all your
> >>> children documents.
> >>>
> >>>
> >>> --
> >>> Adrien
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>>
> >>
>
>
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message