lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: "shared fields"?
Date Wed, 09 Mar 2011 14:48:33 GMT
How large is (large)? What machines are you intending to run this on?

In general, though, don't worry about index size until you actually have some
numbers to deal with. Solr generally has resource issues based on the number
of #unique# terms in an index. So repeating the same thing in a bunch of
documents isn't as bad as you'd suppose.

If you *store* fields, certain files in your index will grow linearly,
but these aren't
the ones that are used for searching. *.fdx, *.fdt, *.fnm in
particular will grow.

So I'd go ahead and just replicate the data and then monitor your system for, in
particular, cache issues (see the admin/stats page). Normalizing your data
is tricky in Solr, so don't do it unless it proves necessary IMO...

Best
Erick

On Wed, Mar 9, 2011 at 9:38 AM, sol myr <solmyr72@yahoo.com> wrote:
> Hi,
>
> I have several documents that share the same (large) searchable data.
> For example, say my Documents represent movies, and  2 movies share the same actorBiography
of Brad Pitt (assuming I want
> to search movies by actorBiography words, far-fetched as it might seem):
>
>
> Document1:
> - movieName="Benjamin Button"
> - actorBiography="Brad Pitt was born in 1963 in Oklahoma and raised in..."
> Document2:
>
> - movieName="Ocean 11"
>
> - actorBiography="Brad Pitt was born in 1963 in Oklahoma and raised in..."
>
> My question: I'm afraid my index files will become very large, due to the duplication
of information. Is there any trick that would keep my index files in a reasonable size, while
still allowing the functionality of "search movie by actorBiography"?
> Thanks :)
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message