incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: Index Warmup in Blur
Date Sat, 12 Oct 2013 12:24:01 GMT
On Fri, Oct 11, 2013 at 10:10 AM, Ravikumar Govindarajan <
ravikumar.govindarajan@gmail.com> wrote:

> Done.
>
> I missed one more point
>
>
> >
> > Also, it would be awesome if Blur supports a per-row auto-complete
> feature.
> >
>
> Not sure what you mean.  Are you talking about in the shell?
>
>
> I was referring to auto-complete/fill during search like in google/gmail,
> but in our case we may need to tailor it per-row, instead of global
> suggestions. It has to be exposed as a thrift-API
>

Auto-complete is a really interesting feature, it may work for some use
cases and not for others.  Perhaps a pluggable auto-complete module?  Or a
way to make a custom RPC call so that you could create one for yourself?
 Not sure.  We should make a separate issue for it and discuss.

Aaron


>
> --
> Ravi
>
>
> On Fri, Oct 11, 2013 at 6:21 PM, Aaron McCurry <amccurry@gmail.com> wrote:
>
> > I think that's a good idea.  I like the plan to make it an option.  Could
> > you go to issue https://issues.apache.org/jira/browse/BLUR-220 and
> either
> > link to this thread.  Or add a comment to the issue with your thoughts?
> >  Thanks!
> >
> > Aaron
> >
> >
> > On Fri, Oct 11, 2013 at 2:47 AM, Ravikumar Govindarajan <
> > ravikumar.govindarajan@gmail.com> wrote:
> >
> > > Ah, that explains it I guess. This Block indexing of all records of a
> row
> > > should be an option. It will have big costs for online indexing.
> > >
> > > Lets take the case of gmail itself. A user will have
> > hundreds-of-thousands
> > > of e-mails and every day 10-15 mails at different time intervals, will
> be
> > > added to the corpus
> > >
> > > Scattering records across segments and taking a minor hit during
> search,
> > > will be the preferred choice right?
> > >
> > > As a compensation, we can use a SortingMergePolicy as documented at
> > > https://issues.apache.org/jira/browse/LUCENE-4752
> > >
> > > We can co-locate all records of a given row during merge across
> > > participating segments. This will offset the performance loss to a good
> > > extent
> > >
> > > What do you think?
> > >
> > > --
> > > Ravi
> > >
> > >
> > > On Fri, Oct 11, 2013 at 6:12 AM, Aaron McCurry <amccurry@gmail.com>
> > wrote:
> > >
> > > > On Thu, Oct 10, 2013 at 6:47 AM, Ravikumar Govindarajan <
> > > > ravikumar.govindarajan@gmail.com> wrote:
> > > >
> > > > > I saw this JIRA on humungous rows and got quite confused on the
> > > > UPDATE_ROW
> > > > > operation.
> > > > >
> > > > > https://issues.apache.org/jira/browse/BLUR-220
> > > > >
> > > > > Lets say I add 2 records to a row, whose existing records number
in
> > > > > hundreds-of-thousands.
> > > > >
> > > > > Will Blur attempt to first read all these records before adding the
> > > > > incoming 2 records?
> > > > >
> > > >
> > > > It has to right now.
> > > >
> > > >
> > > > >
> > > > > What, if we just expose simple record-add/delete on a row, without
> > > > fetching
> > > > > the row at all?
> > > > >
> > > >
> > > > The problem is that the internal query class is built to only support
> > > > records (documents) that are indexed together as a single block,
> > within a
> > > > single segment.  It is very performant for reads and searches, but as
> > the
> > > > row grows in size it becomes very costly.
> > > >
> > > > One idea I had was to detect when rows are hot (being updated a lot)
> or
> > > > they are too large and move them into there own indexes.  For the hot
> > > rows,
> > > > once they cool off they could be merged back in with the regular rows
> > in
> > > > the main index.
> > > >
> > > >
> > > > >
> > > > > It should be quite quick and highly useful, at least for apps
> already
> > > > using
> > > > > lucene.
> > > > >
> > > >
> > > > Agreed, that's what that issue is meant to solve.
> > > >
> > > >
> > > > >
> > > > > --
> > > > > Ravi
> > > > >
> > > > >
> > > > > On Wed, Oct 9, 2013 at 11:27 AM, Ravikumar Govindarajan <
> > > > > ravikumar.govindarajan@gmail.com> wrote:
> > > > >
> > > > > > Yes, I think bringing in a mutable file in lucene-index brings
> it's
> > > own
> > > > > > set of problems to handle. Filters, Caches, Scoring,
> > > Snapshots/Commits
> > > > > > etc... will all be affected.
> > > > > >
> > > > > > There is on JIRA on writing generation of updatable files, just
> > like
> > > > > > doc-deletes instead of over-writing a single file.[
> > > > > > https://issues.apache.org/jira/browse/LUCENE-4258]. But that
is
> > > still
> > > > > > in-progress and from what I understand, it could slow searches
> > > > > considerably.
> > > > > >
> > > > > > BTW, is it possible to extend BlurPartitioner and load it during
> > > > > start-up?
> > > > > >
> > > > > > Also, it would be awesome if Blur supports a per-row
> auto-complete
> > > > > feature.
> > > > > >
> > > > > > --
> > > > > > Ravi
> > > > > >
> > > > > >
> > > > > > On Sat, Oct 5, 2013 at 2:01 AM, Aaron McCurry <
> amccurry@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > >> I have thought of one possible problem with this approach.
 To
> > date
> > > > the
> > > > > >> mindset I have used in all of the Blur internals is that
> segments
> > > are
> > > > > >> immutable.  This is a fundamental principle that Blur uses
and I
> > > don't
> > > > > >> really have any ideas on where to behind checking for when
this
> > is a
> > > > > >> problem.  I know filters are going to be an issue, not sure
> where
> > > > else.
> > > > > >>
> > > > > >> Not saying that it can't be done, it's just not going to
be as
> > clean
> > > > as
> > > > > I
> > > > > >> originally thought.
> > > > > >>
> > > > > >> Aaron
> > > > > >>
> > > > > >>
> > > > > >> On Fri, Oct 4, 2013 at 4:26 PM, Aaron McCurry <
> amccurry@gmail.com
> > >
> > > > > wrote:
> > > > > >>
> > > > > >> >
> > > > > >> >
> > > > > >> > On Fri, Oct 4, 2013 at 7:15 AM, Ravikumar Govindarajan
<
> > > > > >> > ravikumar.govindarajan@gmail.com> wrote:
> > > > > >> >
> > > > > >> >> On a related note, do you think such an approach
will fit in
> > Blur
> > > > > >> >>
> > > > > >> >> 1. Store the BDB file in shard-server itself.
> > > > > >> >>
> > > > > >> >
> > > > > >> > Probably not, this would pin the BDB (or whatever the
solution
> > > would
> > > > > be)
> > > > > >> > to a specific server.  We will have to sync to HDFS.
> > > > > >> >
> > > > > >> >
> > > > > >> >>
> > > > > >> >> 2. Apply all incoming partial doc-updates to local
BDB file
> as
> > > well
> > > > > as
> > > > > >> an
> > > > > >> >>     update-transaction log
> > > > > >> >>
> > > > > >> >
> > > > > >> > Blur already has a write ahead log as apart of internals.
>  It's
> > > > > written
> > > > > >> > and synced to HDFS.
> > > > > >> >
> > > > > >> >
> > > > > >> >>
> > > > > >> >> 3. Periodically sync dirty BDB files to HDFS and
roll-over
> the
> > > > > update-
> > > > > >> >>  transaction log.
> > > > > >> >
> > > > > >> >
> > > > > >> >> Whenever a shard-server goes down, the take-over
server can
> > > > initially
> > > > > >> sync
> > > > > >> >> the BDB file from HDFS to local, replay the
> update-transaction
> > > log
> > > > > and
> > > > > >> >> then
> > > > > >> >> start serving data
> > > > > >> >>
> > > > > >> >
> > > > > >> > Blur already does this internally, it records the mutates
and
> > > > replays
> > > > > >> them
> > > > > >> > if a failure happens before a commit.
> > > > > >> >
> > > > > >> > Aaron
> > > > > >> >
> > > > > >> >
> > > > > >> >>
> > > > > >> >> --
> > > > > >> >> Ravi
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> On Thu, Oct 3, 2013 at 11:14 PM, Ravikumar Govindarajan
<
> > > > > >> >> ravikumar.govindarajan@gmail.com> wrote:
> > > > > >> >>
> > > > > >> >> > The mutate APIs are a good fit for individual
cols update.
> > > > > BlurCodec
> > > > > >> >> will
> > > > > >> >> > be cool and solve a lot of problems.
> > > > > >> >> >
> > > > > >> >> > There are 3 caveats for such a codec
> > > > > >> >> >
> > > > > >> >> > 1. Scores for affected queries will be wrong,
until
> > > segment-merge
> > > > > >> >> >
> > > > > >> >> > 2. Responsibility of ordering updates must
be on the
> client.
> > > > > >> >> >
> > > > > >> >> > 3. Repeated updates for the same document
can either take a
> > > > > >> generational
> > > > > >> >> > approach [Lucene-4258] or use a single version
of storage
> > > > [Redis/TC
> > > > > >> >> etc..],
> > > > > >> >> > pushing the onus to client, depending on how
the Codec
> shapes
> > > up.
> > > > > >> >> >
> > > > > >> >> > The former will be semantically correct but
really sluggish
> > > while
> > > > > the
> > > > > >> >> > latter will be faster during search
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> > On Thu, Oct 3, 2013 at 8:53 PM, Aaron McCurry
<
> > > > amccurry@gmail.com>
> > > > > >> >> wrote:
> > > > > >> >> >
> > > > > >> >> >> On Thu, Oct 3, 2013 at 11:08 AM, Ravikumar
Govindarajan <
> > > > > >> >> >> ravikumar.govindarajan@gmail.com> wrote:
> > > > > >> >> >>
> > > > > >> >> >> > Yeah, you are correct. A BDB file
might probably never
> be
> > > > ported
> > > > > >> to
> > > > > >> >> >> HDFS.
> > > > > >> >> >> >
> > > > > >> >> >> > Our daily update frequency comes
to about 20% of
> insertion
> > > > rate.
> > > > > >> >> >> >
> > > > > >> >> >> > Lets say "UPDATE <TABLE> SET
COL2=1 WHERE COL1=X".
> > > > > >> >> >> >
> > > > > >> >> >> > This update could potentially span
across tens of
> > thousands
> > > of
> > > > > SQL
> > > > > >> >> rows
> > > > > >> >> >> in
> > > > > >> >> >> > our case, where COL2 is just a boolean
flip.
> > > > > >> >> >> >
> > > > > >> >> >> > The problem is not with lucene's
ability to handle load.
> > > > Instead
> > > > > >> it
> > > > > >> >> is
> > > > > >> >> >> with
> > > > > >> >> >> > the consistent load it puts on our
content servers to
> read
> > > and
> > > > > >> >> >> re-tokenize
> > > > > >> >> >> > such huge rows just for a boolean
flip. Another big
> winner
> > > is
> > > > > that
> > > > > >> >> all
> > > > > >> >> >> our
> > > > > >> >> >> > updatable fields are not involved
in scoring at all.
> Just
> > > > > matching
> > > > > >> >> will
> > > > > >> >> >> do.
> > > > > >> >> >> >
> > > > > >> >> >> > The changes also sit in BDB only
till the next segment
> > > merge,
> > > > > >> after
> > > > > >> >> >> which
> > > > > >> >> >> > it is cleaned out. There is very
little perf hit here
> for
> > > us,
> > > > as
> > > > > >> >> users
> > > > > >> >> >> > don't immediately search after a
change.
> > > > > >> >> >> >
> > > > > >> >> >> > I am afraid there is no documentation/code/numbers
on
> this
> > > > > >> currently
> > > > > >> >> in
> > > > > >> >> >> > public, as it is still proprietary
but is remarkably
> > similar
> > > > to
> > > > > >> the
> > > > > >> >> >> popular
> > > > > >> >> >> > to RedisCodec.
> > > > > >> >> >> >
> > > > > >> >> >> > "If you really need partial document
updates, there
> would
> > > need
> > > > > to
> > > > > >> be
> > > > > >> >> >> > changes
> > > > > >> >> >> > throughout the entire stack"
> > > > > >> >> >> >
> > > > > >> >> >> > You mean, the entire stack of Blur?
In case this is
> > > possible,
> > > > > can
> > > > > >> you
> > > > > >> >> >> give
> > > > > >> >> >> > me 10000-ft overview of what you
have in mind?
> > > > > >> >> >> >
> > > > > >> >> >>
> > > > > >> >> >> Interesting, now that I think about it.
 The situation
> that
> > > you
> > > > > >> >> describe
> > > > > >> >> >> is
> > > > > >> >> >> very interesting, I'm wondering if we
came up with
> something
> > > > like
> > > > > >> this
> > > > > >> >> in
> > > > > >> >> >> Blur that it would fix our large Row issue.
 Or at the
> very
> > > > least
> > > > > >> help
> > > > > >> >> the
> > > > > >> >> >> problem.
> > > > > >> >> >>
> > > > > >> >> >> https://issues.apache.org/jira/browse/BLUR-220
> > > > > >> >> >>
> > > > > >> >> >> Plus the more I think about it, the mutate
methods are
> > > probably
> > > > > the
> > > > > >> >> right
> > > > > >> >> >> implementation for modifying single columns.
 So the API
> of
> > > Blur
> > > > > >> >> probably
> > > > > >> >> >> wouldn't need to be changed.  Maybe just
the way it goes
> > about
> > > > > >> dealing
> > > > > >> >> >> with
> > > > > >> >> >> changes.  I thinking maybe we need our
own BlurCodec to
> > handle
> > > > > large
> > > > > >> >> Rows
> > > > > >> >> >> as well as Record (Document) updates.
> > > > > >> >> >>
> > > > > >> >> >> As an aside I constantly am having to
refer to Records as
> > > > > Documents,
> > > > > >> >> this
> > > > > >> >> >> is why I think we need a rename.
> > > > > >> >> >>
> > > > > >> >> >> Aaron
> > > > > >> >> >>
> > > > > >> >> >>
> > > > > >> >> >>
> > > > > >> >> >>
> > > > > >> >> >>
> > > > > >> >> >>
> > > > > >> >> >> >
> > > > > >> >> >> > --
> > > > > >> >> >> > Ravi
> > > > > >> >> >> >
> > > > > >> >> >> >
> > > > > >> >> >> > On Thu, Oct 3, 2013 at 5:36 PM, Aaron
McCurry <
> > > > > amccurry@gmail.com
> > > > > >> >
> > > > > >> >> >> wrote:
> > > > > >> >> >> >
> > > > > >> >> >> > > The biggest issue with this
is that the shards (the
> > > indexes)
> > > > > >> >> inside of
> > > > > >> >> >> > Blur
> > > > > >> >> >> > > actually move from one server
to another.  So to
> support
> > > > this
> > > > > >> >> behavior
> > > > > >> >> >> > all
> > > > > >> >> >> > > the indexes are stored in HDFS.
 Do due the
> differences
> > > > > between
> > > > > >> >> HDFS
> > > > > >> >> >> and
> > > > > >> >> >> > > the a normal POSIX file system,
I highly doubt that
> the
> > > BDB
> > > > > file
> > > > > >> >> form
> > > > > >> >> >> in
> > > > > >> >> >> > > TokyoCabinet can ever be supported.
> > > > > >> >> >> > >
> > > > > >> >> >> > > If you really need partial document
updates, there
> would
> > > > need
> > > > > >> to be
> > > > > >> >> >> > changes
> > > > > >> >> >> > > throughout the entire stack.
 I am curious why you
> need
> > > this
> > > > > >> >> feature?
> > > > > >> >> >>  Do
> > > > > >> >> >> > > you have that many updates to
the index?  What is the
> > > update
> > > > > >> >> >> frequency?
> > > > > >> >> >> > >  I'm just curious of what kind
of performance you get
> > out
> > > > of a
> > > > > >> >> setup
> > > > > >> >> >> like
> > > > > >> >> >> > > that?  Since I haven't ever
run such a setup I have no
> > > idea
> > > > > how
> > > > > >> to
> > > > > >> >> >> > compare
> > > > > >> >> >> > > that kind of system to a base
Lucene setup.
> > > > > >> >> >> > >
> > > > > >> >> >> > > Could you point be to some code
or documentation?  I
> > would
> > > > to
> > > > > go
> > > > > >> >> and
> > > > > >> >> >> > take a
> > > > > >> >> >> > > look.
> > > > > >> >> >> > >
> > > > > >> >> >> > > Thanks,
> > > > > >> >> >> > > Aaron
> > > > > >> >> >> > >
> > > > > >> >> >> > >
> > > > > >> >> >> > >
> > > > > >> >> >> > > On Thu, Oct 3, 2013 at 7:00
AM, Ravikumar
> Govindarajan <
> > > > > >> >> >> > > ravikumar.govindarajan@gmail.com>
wrote:
> > > > > >> >> >> > >
> > > > > >> >> >> > > > One more help.
> > > > > >> >> >> > > >
> > > > > >> >> >> > > > We also maintain a file
by name "BDB", just like the
> > > > > "Sample"
> > > > > >> >> file
> > > > > >> >> >> for
> > > > > >> >> >> > > > tracing used by Blur.
> > > > > >> >> >> > > >
> > > > > >> >> >> > > > This "BDB" file pertains
to TokyoCabinet and is used
> > > > purely
> > > > > >> for
> > > > > >> >> >> > > supporting
> > > > > >> >> >> > > > partial updates to a document.
> > > > > >> >> >> > > > All operations on this
file rely on local file-paths
> > > only,
> > > > > >> >> through
> > > > > >> >> >> the
> > > > > >> >> >> > > use
> > > > > >> >> >> > > > of native code.
> > > > > >> >> >> > > > Currently, all update requests
are local to the
> index
> > > > files
> > > > > >> and
> > > > > >> >> it
> > > > > >> >> >> > > becomes
> > > > > >> >> >> > > > trivial to support.
> > > > > >> >> >> > > >
> > > > > >> >> >> > > > Any pointers on how to
take this forward in Blur
> > set-up
> > > of
> > > > > >> >> >> > shard-servers
> > > > > >> >> >> > > &
> > > > > >> >> >> > > > controllers?
> > > > > >> >> >> > > >
> > > > > >> >> >> > > > --
> > > > > >> >> >> > > > Ravi
> > > > > >> >> >> > > >
> > > > > >> >> >> > > >
> > > > > >> >> >> > > > On Tue, Oct 1, 2013 at
10:15 PM, Aaron McCurry <
> > > > > >> >> amccurry@gmail.com>
> > > > > >> >> >> > > wrote:
> > > > > >> >> >> > > >
> > > > > >> >> >> > > > > You can control the
fields to warmup via:
> > > > > >> >> >> > > > >
> > > > > >> >> >> > > > >
> > > > > >> >> >> > > > >
> > > > > >> >> >> > > >
> > > > > >> >> >> > >
> > > > > >> >> >> >
> > > > > >> >> >>
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> http://incubator.apache.org/blur/docs/0.2.0/Blur.html#Struct_TableDescriptor
> > > > > >> >> >> > > > >
> > > > > >> >> >> > > > > The preCacheCols field.
 The comment is wrong
> > however,
> > > > so
> > > > > I
> > > > > >> >> will
> > > > > >> >> >> > > create a
> > > > > >> >> >> > > > > task to correct. 
The use of the field is:
> > > > "family.column"
> > > > > >> just
> > > > > >> >> >> like
> > > > > >> >> >> > > you
> > > > > >> >> >> > > > > would search.
> > > > > >> >> >> > > > >
> > > > > >> >> >> > > > > Aaron
> > > > > >> >> >> > > > >
> > > > > >> >> >> > > > >
> > > > > >> >> >> > > > > On Tue, Oct 1, 2013
at 12:41 PM, Ravikumar
> > > Govindarajan
> > > > <
> > > > > >> >> >> > > > > ravikumar.govindarajan@gmail.com>
wrote:
> > > > > >> >> >> > > > >
> > > > > >> >> >> > > > > > Thanks Aaron
> > > > > >> >> >> > > > > >
> > > > > >> >> >> > > > > > General sampling
and warming is fine and the
> code
> > is
> > > > > >> really
> > > > > >> >> >> concise
> > > > > >> >> >> > > and
> > > > > >> >> >> > > > > > clear.
> > > > > >> >> >> > > > > >
> > > > > >> >> >> > > > > >  The act of reading
> > > > > >> >> >> > > > > > brings the data
into the block cache and the
> > result
> > > is
> > > > > >> that
> > > > > >> >> the
> > > > > >> >> >> > index
> > > > > >> >> >> > > > is
> > > > > >> >> >> > > > > > "hot".
> > > > > >> >> >> > > > > >
> > > > > >> >> >> > > > > > Will all the
terms of a field be read and
> brought
> > > into
> > > > > the
> > > > > >> >> >> cache?
> > > > > >> >> >> > If
> > > > > >> >> >> > > > so,
> > > > > >> >> >> > > > > > then it has an
obvious implication to avoid
> fields
> > > > like,
> > > > > >> say
> > > > > >> >> >> > > > > > attachment-data
from warming up, provided
> queries
> > > > don't
> > > > > >> often
> > > > > >> >> >> > include
> > > > > >> >> >> > > > > such
> > > > > >> >> >> > > > > > fields
> > > > > >> >> >> > > > > >
> > > > > >> >> >> > > > > >
> > > > > >> >> >> > > > > > On Tue, Oct 1,
2013 at 7:58 PM, Aaron McCurry <
> > > > > >> >> >> amccurry@gmail.com>
> > > > > >> >> >> > > > > wrote:
> > > > > >> >> >> > > > > >
> > > > > >> >> >> > > > > > > Take a look
at this package.
> > > > > >> >> >> > > > > > >
> > > > > >> >> >> > > > > > >
> > > > > >> >> >> > > > > > >
> > > > > >> >> >> > > > > >
> > > > > >> >> >> > > > >
> > > > > >> >> >> > > >
> > > > > >> >> >> > >
> > > > > >> >> >> >
> > > > > >> >> >>
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=tree;f=blur-store/src/main/java/org/apache/blur/lucene/warmup;h=f4239b1947965dc7fe8218eaa16e3f39ecffdda0;hb=apache-blur-0.2
> > > > > >> >> >> > > > > > >
> > > > > >> >> >> > > > > > > Basically
when the warmup process starts
> (which
> > is
> > > > > >> >> >> asynchronous
> > > > > >> >> >> > to
> > > > > >> >> >> > > > the
> > > > > >> >> >> > > > > > rest
> > > > > >> >> >> > > > > > > of the application)
it flips a thread local
> > switch
> > > > to
> > > > > >> allow
> > > > > >> >> >> for
> > > > > >> >> >> > > > tracing
> > > > > >> >> >> > > > > > of
> > > > > >> >> >> > > > > > > the file
accesses.  The sampler will sample
> each
> > > of
> > > > > the
> > > > > >> >> >> fields in
> > > > > >> >> >> > > > each
> > > > > >> >> >> > > > > > > segment
and create a sample file that attempts
> > to
> > > > > detect
> > > > > >> >> the
> > > > > >> >> >> > > > boundaries
> > > > > >> >> >> > > > > > of
> > > > > >> >> >> > > > > > > each field
within each file within each
> segment.
> > > >  Then
> > > > > >> it
> > > > > >> >> >> stores
> > > > > >> >> >> > > the
> > > > > >> >> >> > > > > > sample
> > > > > >> >> >> > > > > > > info into
the directory beside each segment
> (so
> > > that
> > > > > >> way it
> > > > > >> >> >> > doesn't
> > > > > >> >> >> > > > > have
> > > > > >> >> >> > > > > > to
> > > > > >> >> >> > > > > > > re-sample
the segment).  After the sampling is
> > > > > complete
> > > > > >> or
> > > > > >> >> >> > loaded,
> > > > > >> >> >> > > > the
> > > > > >> >> >> > > > > > > warmup just
reads the binary data from each
> > file.
> > > >  The
> > > > > >> act
> > > > > >> >> of
> > > > > >> >> >> > > reading
> > > > > >> >> >> > > > > > > brings the
data into the block cache and the
> > > result
> > > > is
> > > > > >> that
> > > > > >> >> >> the
> > > > > >> >> >> > > index
> > > > > >> >> >> > > > > is
> > > > > >> >> >> > > > > > > "hot".
> > > > > >> >> >> > > > > > >
> > > > > >> >> >> > > > > > > Hope this
helps.
> > > > > >> >> >> > > > > > >
> > > > > >> >> >> > > > > > > Aaron
> > > > > >> >> >> > > > > > >
> > > > > >> >> >> > > > > > >
> > > > > >> >> >> > > > > > >
> > > > > >> >> >> > > > > > >
> > > > > >> >> >> > > > > > > On Tue,
Oct 1, 2013 at 10:09 AM, Ravikumar
> > > > > Govindarajan
> > > > > >> <
> > > > > >> >> >> > > > > > > ravikumar.govindarajan@gmail.com>
wrote:
> > > > > >> >> >> > > > > > >
> > > > > >> >> >> > > > > > > > As
I understand,
> > > > > >> >> >> > > > > > > >
> > > > > >> >> >> > > > > > > > Lucene
will store the files in following way
> > > > > >> per-segment
> > > > > >> >> >> > > > > > > >
> > > > > >> >> >> > > > > > > > TIM
file
> > > > > >> >> >> > > > > > > >   
  Field1 ---> Some byte[]
> > > > > >> >> >> > > > > > > >   
  Field2 ---> Some byte[]
> > > > > >> >> >> > > > > > > >
> > > > > >> >> >> > > > > > > > TIP
file
> > > > > >> >> >> > > > > > > >   
  Field1 ---> Some byte[]
> > > > > >> >> >> > > > > > > >   
  Field2 ---> Some byte[]
> > > > > >> >> >> > > > > > > >
> > > > > >> >> >> > > > > > > >
> > > > > >> >> >> > > > > > > > Blur
will "sample" this lucene-file in the
> > > > following
> > > > > >> way
> > > > > >> >> >> > > > > > > >
> > > > > >> >> >> > > > > > > > Field1
--> <TIM, start-offset>, <TIP,
> > > > start-offset>,
> > > > > >> ...
> > > > > >> >> >> > > > > > > >
> > > > > >> >> >> > > > > > > > Field
2 --> <TIM, start-offset>, <TIP,
> > > > > start-offset>,
> > > > > >> ...
> > > > > >> >> >> > > > > > > >
> > > > > >> >> >> > > > > > > > Is
my understanding correct?
> > > > > >> >> >> > > > > > > >
> > > > > >> >> >> > > > > > > > How
does Blur warm-up the fields, when it
> does
> > > not
> > > > > >> know
> > > > > >> >> the
> > > > > >> >> >> > > > > > "end-offset"
> > > > > >> >> >> > > > > > > or
> > > > > >> >> >> > > > > > > > the
"length" for each field to warm.
> > > > > >> >> >> > > > > > > >
> > > > > >> >> >> > > > > > > > Will
it by default read all Terms of a
> field?
> > > > > >> >> >> > > > > > > >
> > > > > >> >> >> > > > > > > > --
> > > > > >> >> >> > > > > > > > Ravi
> > > > > >> >> >> > > > > > > >
> > > > > >> >> >> > > > > > >
> > > > > >> >> >> > > > > >
> > > > > >> >> >> > > > >
> > > > > >> >> >> > > >
> > > > > >> >> >> > >
> > > > > >> >> >> >
> > > > > >> >> >>
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message