incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: Index Warmup in Blur
Date Tue, 01 Oct 2013 14:28:34 GMT
Take a look at this package.

https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=tree;f=blur-store/src/main/java/org/apache/blur/lucene/warmup;h=f4239b1947965dc7fe8218eaa16e3f39ecffdda0;hb=apache-blur-0.2

Basically when the warmup process starts (which is asynchronous to the rest
of the application) it flips a thread local switch to allow for tracing of
the file accesses.  The sampler will sample each of the fields in each
segment and create a sample file that attempts to detect the boundaries of
each field within each file within each segment.  Then it stores the sample
info into the directory beside each segment (so that way it doesn't have to
re-sample the segment).  After the sampling is complete or loaded, the
warmup just reads the binary data from each file.  The act of reading
brings the data into the block cache and the result is that the index is
"hot".

Hope this helps.

Aaron




On Tue, Oct 1, 2013 at 10:09 AM, Ravikumar Govindarajan <
ravikumar.govindarajan@gmail.com> wrote:

> As I understand,
>
> Lucene will store the files in following way per-segment
>
> TIM file
>      Field1 ---> Some byte[]
>      Field2 ---> Some byte[]
>
> TIP file
>      Field1 ---> Some byte[]
>      Field2 ---> Some byte[]
>
>
> Blur will "sample" this lucene-file in the following way
>
> Field1 --> <TIM, start-offset>, <TIP, start-offset>, ...
>
> Field 2 --> <TIM, start-offset>, <TIP, start-offset>, ...
>
> Is my understanding correct?
>
> How does Blur warm-up the fields, when it does not know the "end-offset" or
> the "length" for each field to warm.
>
> Will it by default read all Terms of a field?
>
> --
> Ravi
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message