lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Solr defining Schema structure trouble.
Date Wed, 14 Nov 2012 23:22:15 GMT
You can break your books into individual pages, each a separate Solr 
"document", with the full page text as one tokenized text field value. Solr 
(Lucene) will take care of indexing the individual terms on each page. Then 
when you query on terms, Solr will find all pages that have the specified 
terms, ranking them by frequency and number of terms that match on each 
page.

You can also use grouping (field collapsing) to group the pages by book 
(another field or the id would be the book name.)

-- Jack Krupansky

-----Original Message----- 
From: denl0
Sent: Wednesday, November 14, 2012 8:26 AM
To: solr-user@lucene.apache.org
Subject: Solr defining Schema structure trouble.

I'm having trouble putting somewhat related data in my solr schema.
I know solr isn't a database but I need some data to be put in solr.

Problem.
I have plenty of books to index.
The user want's page hit results. The terms you where looking for are found
on page X.
To do this I was told to make a solrDocument of each seperate pageContent
and pass that to solr.

The problems I have in my structure are.

-*Data stored related to the document is stored x pages times. While it
should only be stored once*? (In this case only the name but I have alot
more fields)

<solrDoc>
<id>1</id>
<docname>test.pdf</docmname>
<pagenumber>1</pagenumber>
<pagecontent>blablabla</pagecontent>
</solrDoc>

<solrDoc>
<id>2</id>
<docname>test.pdf</docmname>
<pagenumber>2</pagenumber>
<pagecontent>blablabla</pagecontent>
</solrDoc>

-*Some data related to a document is related to each othe*r.
Let's say these combinations are possible

-ac
-ad
-be

<solrDoc>
<id>2</id>
<docname>test.pdf</docmname>
<pagenumber>2</pagenumber>
<pagecontent>blablabla</pagecontent>

<model>a</model> //multivalue field model
<model>b</model>
<extra>c</extra>  //multivaluefield extra
<extra>d</extra>
<extra>e</extra>
</solrDoc>

I was wondering how I could solve these problems with the creation of my
schema. And how to query it!

An option would be to create a document with each possible combination of
the model and extra fields. But I don't know how this would be possible to
query



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-defining-Schema-structure-trouble-tp4020305.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Mime
View raw message