lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Lu" <chris...@gmail.com>
Subject Re: Advice on 3NF Data Structures and Lucene Please
Date Thu, 14 Dec 2006 06:08:51 GMT
I think the last structure is good. The index should be structured
according to how you want to search it. If your needs changed, you
should simply have another index. One index for all is not really
good. Index is more of trading space for time, so duplication is not
really a concern.

The first structure omits some hobby data, and the second structure
will have duplicated people that needs to be pruned.

-- 
Chris Lu
-------------------------
Instant Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com


On 12/13/06, Andrew Hughes <azza@lisasoft.com> wrote:
> Thanks Erick,
>
> I'll give a representation of the data structure that I am trying to
> index (in xml)..... This represents a relational data structure. Because
> all Place (ie Kazakhstan) Person's are grouped together eta....
>
>     <Example>
>         <Place name="United States of America">
>            <PlaceAlias>USA</PlaceAlias>
>            <PlaceAlias>U.S.A</PlaceAlias>
>            <PlaceAlias>US</PlaceAlias>
>            <Person>
>               <Name>George W Bush</Name>
>               <Occupation>Demolition</Occupation>
>               <Hobby alias="Funny">Comedy</Hobby>
>               <Hobby alias="Pretend">Mime</Hobby>
>               <Hobby>Ant Farms</Hobby>
>            </Person>
>            <Person>
>               <Name>Bill Clinton</Name>
>               <Occupation>Retired</Occupation>
>               <Hobby>Smoking Cigars</Hobby>
>            </Person>
>            <!-- many more person's here.... -->
>            <!-- many more person's here.... -->
>            <!-- many more person's here.... -->
>         </Place>
>         <Place name="kazakhstan">
>            <PlaceAlias>kazak</PlaceAlias>
>            <PlaceAlias>kazzi</PlaceAlias>
>            <PlaceAlias>kzh</PlaceAlias>
>            <Person>
>               <Name>Borat</Name>
>               <Occupation>TV Reporter</Occupation>
>               <Hobby alias="Boogie">Dancing</Hobby>
>               <Hobby alias="Soccer">Football</Hobby>
>               <Hobby>Swimming</Hobby>
>               <!-- many more hobbie's in here.. (or even none) with or
>     without aliases -->
>               <!-- many more hobbie's in here.. (or even none) with or
>     without aliases -->
>               <!-- many more hobbie's in here.. (or even none) with or
>     without aliases -->
>            </Person>
>            <!-- many more person's here.... -->
>            <!-- many more person's here.... -->
>            <!-- many more person's here.... -->
>         </Place>
>         <!-- many more place's, person's and hobbie's here.... -->
>         <!-- many more place's, person's and hobbie's here.... -->
>         <!-- many more place's, person's and hobbie's here.... -->
>     </Example>
>
>
> I am expecting someone to say that this Relational/3NF strucutre should
> simply be placed into a flat index... the concept of an index replaces
> the 1-Many relational approach by grouping/indexing all "documents" with
> the same "Place" together... or at least effectively making the search
> time so fast and hence achieving a usable solution....
>
>     Place     Person_Name       Person_Occupation  Hobby
>     ===========================================================================
>     USA          George W Bush  Demolition         Comedy
>     USA          Bill  Clinton  Retired            Smoking Cigars
>     Kazakhstan   Borat          TV Presenter       Dancing
>
>
>
> I do however ask... how would one group duplicate fields.... such as the
> "Hobbie's" below..... should these simply be a single field in the
> lucene index??? that are tokenized? Or should everything be
> *duplicated*???? Like this.... (plus I have ignored Alias' for simplicity).
>
>
>     Place     Person_Name       Person_Occupation  Hobby
>     ===========================================================================
>     USA          George W Bush  Demolition         Comedy
>     USA          George W Bush  Demolition         Mime
>     USA          George W Bush  Demolition         Ant Farms
>     USA          Bill  Clinton  Retired            Smoking Cigars
>     Kazakhstan   Borat          TV Presenter       Dancing
>     Kazakhstan   Borat          TV Presenter       Football
>     Kazakhstan   Borat          TV Presenter       Swimming
>
>     OR
>
>     Place     Person_Name       Person_Occupation  Hobby
>     ===========================================================================
>     USA          George W Bush  Demolition         Comedy + Mime + Ant Farms
>     USA          Bill  Clinton  Retired            Smoking Cigars
>     Kazakhstan   Borat          TV Presenter       Dancing + Football +
>     Swimming
>
>
> I guess my final question, which is really what I am trying to achieve
> is this.... I want to search for all "Person's" in the "~United States
> of America", who's name is like "~Klinton" and enjoy's "~Smoking" for a
> Hobby. An important part of this.... is that "I Wont know which token is
> to be matched to which field", like when you go to an internet search
> engine..... so I do I tokenize and put all fields from the XML into a
> single Field in the index and query that with tokens??????
>
>
> I realize that I'm posting LOTS of complicated questions.... and I am
> probably just looking at the equivalent of a HTML indexing/search
> implementation.
>
>
>
> Many Thanks....
>
> --AH
>
>
>
>
> Erick Erickson wrote:
> > Tell us more about the problem you are trying to solve. Lucene is
> > designed
> > for large text searching, not relations. Trying to "index a data
> > structure"
> > seems like mis-application of Lucene. Without some idea of what you are
> > trying to accomplish, any advice you get is irrelevant at best...
> >
> >
> > Best
> > Erick
> >
> > On 12/13/06, Andrew Hughes <azza@lisasoft.com> wrote:
> >>
> >> Hey All,
> >>
> >> I am very interested in indexing a 3NF Data Structure. Is there any
> >> advice that someone can provide with this? From what I have seen Lucene
> >> is typically a flat "First Normal Form" (Flat) data structure.... The
> >> only way I can see to combine the relational links between multiple
> >> indexes is to compare documents.
> >>
> >>
> >> Any Help is Appreciated.
> >>
> >>
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message