lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Cwikla <Cwi...@Biz360.com>
Subject RE: Abusing lucene and way too many files open
Date Mon, 16 Sep 2002 18:11:30 GMT

Hmm, I didn't know this was possible, thanks for the info on this one.
That would take me down to only 20000 files open :) But that is doable.

-----Original Message-----
From: Alex Murzaku [mailto:murzaku@yahoo.com]
Sent: Sunday, September 15, 2002 12:29 PM
To: Lucene Users List
Subject: Re: Abusing lucene and way too many files open


I haven't played with this, but I was under the impression that you
only need to worry that same data are indexed and queried using the
same analyzer. If the language is always included in the query (or
indexing) the data would be virtually partitioned by language. As for
the analyzer, you just need to know before submitting the query or
indexing what is the language you will be using and calling the
corresponding analyzer. Now, I don't want to get into too much detail
because I haven't tried this myself. It just seems logical...

--- John L Cwikla <cwikla@biz360.com> wrote:
> Interestingly, it's really not about the size of the dataset, it's a
> question
> of partitioning the data in the dataset.
> 
> There are two reasons for not including the account and language in
> the
> record. For the language, we have different analyzers for each
> language
> so they require their own index.  For the accounts, it's a question
> of
> stability
> and ease of flushing/adding/removing some or all the records,
> constantly.
> Putting the account number in the record instead of having an index
> per
> account
> is doable, and I am leaning toward that solution in the interim,
> having 1
> index
> per language, but I worry about
> the downtime to other accounts when I need to do some major deletion
> to
> one (or some) of the accounts as they get locked out during any
> merging/optimizing
> phase, and also if something goes wrong I'd still like to limit it to
> 1
> account (hopefully)
> instead of having to reindex 100.
> 
> 
> ----- Original Message -----
> From: "Alex Murzaku" <murzaku@yahoo.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Sunday, September 15, 2002 6:18 AM
> Subject: Re: Abusing lucene and way too many files open
> 
> 
> > > So after roughly crunching some numbers, with a mergeFactor set
> to
> > > the default of 10:
> > >
> > > (7 files per segment + 12 for fields) * (up to 10 segments) * 100
> > > accounts * 10 langauges
> > >
> > > = 200,000 open files at once. Ouch.
> >
> > I have read accounts of people in this list dealing with data sets
> > larger than what you describe. There is one thing that you said you
> > were considering: why not include both account and language in the
> > record. Unless planning to distribute them over several machines, I
> > wouldn't create artificially 100*10 indices. In you case, when
> > indexing, I would have instead 14 fields * 7 files * 10 segments
> open
> > files... I guess you already have resolved the problem of
> identifying
> > the language and you have built the corresponding analyzers which
> you
> > call when indexing or querying in a given language. Just a thought.
> >
> >
> > =====
> > __________________________________
> > alex@lissus.com -- http://www.lissus.com
> >
> > __________________________________________________
> > Do you Yahoo!?
> > Yahoo! News - Today's headlines
> > http://news.yahoo.com
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
> >
> 
> 
> 
> --
> To unsubscribe, e-mail:  
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
> 


__________________________________________________
Do you Yahoo!?
Yahoo! News - Today's headlines
http://news.yahoo.com

--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message