lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niclas Rothman <n...@lechill.com>
Subject Subqueries / multivalued fields / huge documents
Date Fri, 05 Feb 2010 10:40:21 GMT
Hi there, I'm facing a problem that im having difficulties to solve and im wondering if any
of you could help on the right way.

I have an index storing information about media like videos.
Every video object has information about which browsers it is compatible with, e.g. video
1 is compatible with firefox, internet explorer and soforth.
An example of such document can be visualized like:

<doc>
                <media>
                                <id>12345</id>
                                <title>A title</title>
                                <description>My description</description>
                                <useragents>
                                                <!-- The useragents element can contain
up to 15000 entries!!!! -->

                                                Mozilla/5.0 (Linux; U; Android 1.5; en-gb;
HTC Magic Build/CRB43) AppleWebKit/528.5+ (KHTML, like Gecko) Version/3.1.2 Mobile Safari/525.20.1
                                                Mozilla/4.0 SonyEricssonW910iv/R1CA Browser/NetFront/3.4
Profile/MIDP-2.1 Configuration/CLDC-1.1 UP.Link/6.3.1.20.0
                                                NokiaN81
                                                .
                                                .
                                                .n
                                </useragents>
                </media>
</doc>

Except from finding media objects that are relevant to a users search, e.g. where title equals
"A title" I may only serve / display items that are compatible with the users browsers useragent,
e.g. find all media objecst that are compatible with Firefox.

Problems:

1.       One media object can contain up to 15000 useragent entries, can I index this with
deccent performance?

2.       The useragent values can be "partial" or wildcarded, e.g. "*NokiaN81*" meaning that
the media object is compatible with all browsers having a useragent containting "NokiaN81".

3.       Can I get any performance out of this, FilterQueries, RangeQueries how?

Any help very much appreciated!!!!

Regards
Magnus

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message