incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: Index Warmup in Blur
Date Tue, 01 Oct 2013 16:45:23 GMT
You can control the fields to warmup via:

http://incubator.apache.org/blur/docs/0.2.0/Blur.html#Struct_TableDescriptor

The preCacheCols field.  The comment is wrong however, so I will create a
task to correct.  The use of the field is: "family.column" just like you
would search.

Aaron


On Tue, Oct 1, 2013 at 12:41 PM, Ravikumar Govindarajan <
ravikumar.govindarajan@gmail.com> wrote:

> Thanks Aaron
>
> General sampling and warming is fine and the code is really concise and
> clear.
>
>  The act of reading
> brings the data into the block cache and the result is that the index is
> "hot".
>
> Will all the terms of a field be read and brought into the cache? If so,
> then it has an obvious implication to avoid fields like, say
> attachment-data from warming up, provided queries don't often include such
> fields
>
>
> On Tue, Oct 1, 2013 at 7:58 PM, Aaron McCurry <amccurry@gmail.com> wrote:
>
> > Take a look at this package.
> >
> >
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=tree;f=blur-store/src/main/java/org/apache/blur/lucene/warmup;h=f4239b1947965dc7fe8218eaa16e3f39ecffdda0;hb=apache-blur-0.2
> >
> > Basically when the warmup process starts (which is asynchronous to the
> rest
> > of the application) it flips a thread local switch to allow for tracing
> of
> > the file accesses.  The sampler will sample each of the fields in each
> > segment and create a sample file that attempts to detect the boundaries
> of
> > each field within each file within each segment.  Then it stores the
> sample
> > info into the directory beside each segment (so that way it doesn't have
> to
> > re-sample the segment).  After the sampling is complete or loaded, the
> > warmup just reads the binary data from each file.  The act of reading
> > brings the data into the block cache and the result is that the index is
> > "hot".
> >
> > Hope this helps.
> >
> > Aaron
> >
> >
> >
> >
> > On Tue, Oct 1, 2013 at 10:09 AM, Ravikumar Govindarajan <
> > ravikumar.govindarajan@gmail.com> wrote:
> >
> > > As I understand,
> > >
> > > Lucene will store the files in following way per-segment
> > >
> > > TIM file
> > >      Field1 ---> Some byte[]
> > >      Field2 ---> Some byte[]
> > >
> > > TIP file
> > >      Field1 ---> Some byte[]
> > >      Field2 ---> Some byte[]
> > >
> > >
> > > Blur will "sample" this lucene-file in the following way
> > >
> > > Field1 --> <TIM, start-offset>, <TIP, start-offset>, ...
> > >
> > > Field 2 --> <TIM, start-offset>, <TIP, start-offset>, ...
> > >
> > > Is my understanding correct?
> > >
> > > How does Blur warm-up the fields, when it does not know the
> "end-offset"
> > or
> > > the "length" for each field to warm.
> > >
> > > Will it by default read all Terms of a field?
> > >
> > > --
> > > Ravi
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message