lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Srikant Jakilinki" <sriks6...@gmail.com>
Subject Re: Full-Text Search in a Relational Model
Date Mon, 28 Jan 2008 07:57:58 GMT
My first impression is that you need a proper DB and a search on top
of it (but not using the DB/SQL). Perhaps you could try these -
1) http://www.opensymphony.com/compass/content/about.html
2) http://kasparov.skife.org/blog/2004/09/11/#lucene-ojb
3) http://www.dbsight.net/

Please let us know if you find any other useful information in your search.

- SJ

On Jan 24, 2008 5:59 PM, yarongolan <yarong@xennexinc.com> wrote:
>
> Hi,
>
> (Warning, not for the weak-hearted)
>
> I'm currently working on a project where we have a large and complex data
> model, related to Genomics. We are trying to build a search engine that
> provides "full text" and "field-based text" searches for our customer base
> (mostly academic research), and are evaluating different tools for this
> purpose.
>
> As a starting point, we have, as an example, a set of objects (stored in
> tables as a relational model):
> Gene [ID, Symbol, Description]
> Article - M:M with Gene [ID, Title]
> Disease - M:M with Gene [ID, Name]
> Author - M:M with Article [ID, Name]
> (Note: M:M tables exist, just link IDs)
>
> An example model would be (hierarchical, relations dealt with as
> duplications)
>
>   Gene [ID=1, Symbol=EGFR, Description=epidermal growth factor receptor]
>     Article [ID=1, Title=EGFR mutations in lung cancer: correlation with
> clinical response to gefitinib therapy]
>       Author [ID=1, Name=H. Michaelson]
>       Author [ID=2, Name=J. Watson]
>     Article [ID=2, Title=Proteomics analysis of epidermal protein kinases by
> target class-selective prefractionation and tandem mass spectrometry]
>       Author [ID=1, Name=H. Michaelson]
>       Author [ID=3, Name=M. Roberts]
>     Disease [ID=1, Name=Epidermal sluffing]
>
>   Gene [ID=2, Symbol=AHCY, Description=S-adenosylhomocysteine hydrolase]
>     Article [ID=3, Title=Limited proteolysis of S-adenosylhomocysteine
> hydrolase: implications for the three-dimensional structure]
>       Author [ID=4, Name=B. Cohen]
>       Author [ID=5, Name=L. Alexander]
>     Article [ID=2, Title=Proteomics analysis of epidermal protein kinases by
> target class-selective prefractionation and tandem mass spectrometry]
>       Author [ID=1, Name=H. Michaelson]
>       Author [ID=3, Name=M. Roberts]
>
> Note IDs in the objects above, as they relay the relations in the
> hierarchical model.
>
> In our Full-Text search, we would like to allow users to search ANY textual
> field for any string. For instance, the term "epidermal", and display the
> list of genes which have any data associated with them with that term
> (ranked, of course).
> Our list of results would be something like:
>
> EGFR
>   Found in Description (epidermal growth factor receptor)
>   Found in Article ID#2, in Title (proteomics analysis of epidermal protein
> kinases by target class-selective prefractionation and tandem mass
> spectrometry)
>   Found in Disease ID#1, in Name (Epidermal sluffing)
>
> AHCY
>   Found in Article ID#2, in Title (proteomics analysis of epidermal protein
> kinases by target class-selective prefractionation and tandem mass
> spectrometry)
>
> Note that the results retain a hierarchial view of our Genes (us being
> Gene-Centric, we're pretty much framing the question "find this term related
> in information related to those genes"). Also note that Article ID #2 has an
> M:M with Gene ID2 (AHCY) and Gene ID1 (EGFR), and only due to that fact,
> AHCY is considered a gene that has "epidermal" in its annotations.
>
> Obviously, we'd like to rank fields by location in hierarchy (A term in a
> gene name is scored higher than the name of the author of an article related
> to a gene) and by number of hits (number of times a term is found related to
> that gene, 3 in the case of EGFR above).
>
> Ideas for how to take on this challenge? Implementation? Tools?
>
> Thanks!
> Yaron Golan

Mime
View raw message