lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jian chen" <chenjian1...@gmail.com>
Subject Re: Large scale sorting
Date Mon, 09 Apr 2007 22:41:37 GMT
Hi, Paul,

Thanks for your reply. For your previous email about the need for disk based
sorting solution, I kind of agree about your points. One incentive for your
approach is that we don't need to warm-up the index anymore in case that the
index is huge.

In our application, we have to sync up the index pretty frequently, the
warm-up of the index is killing it.

To address your concern about single sort locale, what about creating a sort
field for each sort locale? So, if you have, say, 10 locales, you will have
10 sort fields, each utilizing the mechanism of constructing the norms.

At query time, in the HitCollector, for each doc id matched, you can load
the field value (integer) through the IndexReader. (here you need to enhance
the IndexReader to be able to load the sort field values). Then, you can use
that value to reject/accept the doc, or factor into the score.

How do you think?

Jian



On 4/9/07, Paul Smith <psmith@aconex.com> wrote:
>
> >
> > Now, if we could use integers to represent the sort field values,
> > which is
> > typically the case for most applications, maybe we can afford to
> > have the
> > sort field values stored in the disk and do disk lookup for each
> > document
> > matched? The look up of the sort field value will be as simple as
> > docNo * 4
> > * offset.
> >
> > This way, we use the same approach as constructing the norms
> > (proper merging
> > for incremental indexing), but, at search time, we don't load the
> > sort field
> > values into memory, instead, just store them in disk.
> >
> > Will this approach be good enough?
>
> While a nifty idea, I think this only works for a single sort
> locale.  I initially came up with a similar idea that the terms are
> already stored in 'sorted' order and one might be able to use the
> terms position for sorting, it's just that the terms ordering
> position is different in different locales.
>
> Paul
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message