lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From deminix <demi...@gmail.com>
Subject Re: Term Limit?
Date Sat, 04 Apr 2009 14:25:55 GMT
Thanks for the clarification.

I'm partitioning the document space, so I'm not really concerned about the
fact documents are ints.  Some fields have very unique value spaces though
(and many values per document), and they don't align to the same way the
documents are partitioned so may have a very high number of terms in a
single partition.  I could partition smaller as needed to keep the term
count below 4 billion if I must, but more importantly I'd like to be able to
monitor it.

AFAIK there isn't an api that returns the current number of terms, correct?



On Sat, Apr 4, 2009 at 5:22 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Correct, and, not that I know of.
>
> Mike
>
> On Sat, Apr 4, 2009 at 7:55 AM, Murat Yakici
> <Murat.Yakici@cis.strath.ac.uk> wrote:
> >
> > I assume the total number of documents that you can index is also limited
> > by Java max int. Is this correct? Is there any way to index documents
> > beyond this number in a single index?
> >
> > Murat
> >
> >
> >> I tentatively think you are correct: the file format itself does not
> >> impose this limitation.
> >>
> >> But in a least a couple places internally, Lucene uses a java int to
> >> hold the term number, which is actually a limit of 2,147,483,648
> >> terms.  I'll update fileformats.html for 2.9.
> >
> >
> >>
> >> Mike
> >>
> >> On Sat, Apr 4, 2009 at 2:56 AM, deminix <deminix@gmail.com> wrote:
> >>> http://lucene.apache.org/java/2_4_1/fileformats.html
> >>>
> >>> The file format page at the bottom cites that there is a 32 bit limit
> to
> >>> term numbers.  I fail to see where in the file formats documentation
> >>> that is
> >>> actually true.  Is the bottom of the page simply out of date?  I'm also
> >>> wondering whether the code may be a limiting factor even if the file
> >>> formats
> >>> are ok.
> >>>
> >>> Thanks.
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >
> >
> > Murat Yakici
> > Department of Computer & Information Sciences
> > University of Strathclyde
> > Glasgow, UK
> > -------------------------------------------
> > The University of Strathclyde is a charitable body, registered in
> Scotland,
> > with registration number SC015263.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message