lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Hughes <a...@lisasoft.com>
Subject Re: Advice on 3NF Data Structures and Lucene Please
Date Thu, 14 Dec 2006 00:24:51 GMT
Thanks Erick,

I'll give a representation of the data structure that I am trying to 
index (in xml)..... This represents a relational data structure. Because 
all Place (ie Kazakhstan) Person's are grouped together eta....

    <Example>
        <Place name="United States of America">
           <PlaceAlias>USA</PlaceAlias>
           <PlaceAlias>U.S.A</PlaceAlias>
           <PlaceAlias>US</PlaceAlias>
           <Person>
              <Name>George W Bush</Name>
              <Occupation>Demolition</Occupation>
              <Hobby alias="Funny">Comedy</Hobby>
              <Hobby alias="Pretend">Mime</Hobby>
              <Hobby>Ant Farms</Hobby>
           </Person>
           <Person>
              <Name>Bill Clinton</Name>
              <Occupation>Retired</Occupation>
              <Hobby>Smoking Cigars</Hobby>
           </Person>
           <!-- many more person's here.... -->
           <!-- many more person's here.... -->
           <!-- many more person's here.... -->
        </Place>
        <Place name="kazakhstan">
           <PlaceAlias>kazak</PlaceAlias>
           <PlaceAlias>kazzi</PlaceAlias>
           <PlaceAlias>kzh</PlaceAlias>
           <Person>
              <Name>Borat</Name>
              <Occupation>TV Reporter</Occupation>
              <Hobby alias="Boogie">Dancing</Hobby>
              <Hobby alias="Soccer">Football</Hobby>
              <Hobby>Swimming</Hobby>
              <!-- many more hobbie's in here.. (or even none) with or
    without aliases -->
              <!-- many more hobbie's in here.. (or even none) with or
    without aliases -->
              <!-- many more hobbie's in here.. (or even none) with or
    without aliases -->
           </Person>
           <!-- many more person's here.... -->
           <!-- many more person's here.... -->
           <!-- many more person's here.... -->
        </Place>
        <!-- many more place's, person's and hobbie's here.... -->
        <!-- many more place's, person's and hobbie's here.... -->
        <!-- many more place's, person's and hobbie's here.... -->
    </Example>


I am expecting someone to say that this Relational/3NF strucutre should 
simply be placed into a flat index... the concept of an index replaces 
the 1-Many relational approach by grouping/indexing all "documents" with 
the same "Place" together... or at least effectively making the search 
time so fast and hence achieving a usable solution....

    Place     Person_Name       Person_Occupation  Hobby
    ===========================================================================
    USA          George W Bush  Demolition         Comedy
    USA          Bill  Clinton  Retired            Smoking Cigars
    Kazakhstan   Borat          TV Presenter       Dancing



I do however ask... how would one group duplicate fields.... such as the 
"Hobbie's" below..... should these simply be a single field in the 
lucene index??? that are tokenized? Or should everything be 
*duplicated*???? Like this.... (plus I have ignored Alias' for simplicity).


    Place     Person_Name       Person_Occupation  Hobby
    ===========================================================================
    USA          George W Bush  Demolition         Comedy
    USA          George W Bush  Demolition         Mime
    USA          George W Bush  Demolition         Ant Farms
    USA          Bill  Clinton  Retired            Smoking Cigars
    Kazakhstan   Borat          TV Presenter       Dancing
    Kazakhstan   Borat          TV Presenter       Football
    Kazakhstan   Borat          TV Presenter       Swimming

    OR

    Place     Person_Name       Person_Occupation  Hobby
    ===========================================================================
    USA          George W Bush  Demolition         Comedy + Mime + Ant Farms
    USA          Bill  Clinton  Retired            Smoking Cigars
    Kazakhstan   Borat          TV Presenter       Dancing + Football +
    Swimming


I guess my final question, which is really what I am trying to achieve 
is this.... I want to search for all "Person's" in the "~United States 
of America", who's name is like "~Klinton" and enjoy's "~Smoking" for a 
Hobby. An important part of this.... is that "I Wont know which token is 
to be matched to which field", like when you go to an internet search 
engine..... so I do I tokenize and put all fields from the XML into a 
single Field in the index and query that with tokens??????


I realize that I'm posting LOTS of complicated questions.... and I am 
probably just looking at the equivalent of a HTML indexing/search 
implementation.



Many Thanks....

--AH




Erick Erickson wrote:
> Tell us more about the problem you are trying to solve. Lucene is 
> designed
> for large text searching, not relations. Trying to "index a data 
> structure"
> seems like mis-application of Lucene. Without some idea of what you are
> trying to accomplish, any advice you get is irrelevant at best...
>
>
> Best
> Erick
>
> On 12/13/06, Andrew Hughes <azza@lisasoft.com> wrote:
>>
>> Hey All,
>>
>> I am very interested in indexing a 3NF Data Structure. Is there any
>> advice that someone can provide with this? From what I have seen Lucene
>> is typically a flat "First Normal Form" (Flat) data structure.... The
>> only way I can see to combine the relational links between multiple
>> indexes is to compare documents.
>>
>>
>> Any Help is Appreciated.
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message