lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Libbrecht <p...@hoplahup.net>
Subject Re: Book text with chapter line number
Date Wed, 24 Apr 2013 14:28:41 GMT
It's easy to then store a map of "term position" to line-number and page-number along with
each paragraph, or?

Paul


On 24 avr. 2013, at 16:24, Timothy Potter wrote:

> Chapter seems too broad and line seems too narrow -- have you thought
> about paragraph level? Something like:
> 
> docID, book fields (title, author, publisher, etc), chapter fields (#,
> title, pages, etc), section fields (title, #, etc), sub-sectionN
> fields, paragraph text, lines
> 
> Seems like line #'s would only be useful for display so just store the
> lines the paragraph covers.
> 
> 
> 
> On Tue, Apr 23, 2013 at 7:51 PM, Walter Underwood <wunder@wunderwood.org> wrote:
>> If you can represent your books in XML, then MarkLogic could do the job very cleanly.
It isn't free, but it is very good.
>> 
>> wunder
>> 
>> On Apr 23, 2013, at 6:47 PM, Jason Funk wrote:
>> 
>>> Is there a better tool than Solr to use for my situation?
>>> 
>>> 
>>> On Apr 23, 2013, at 5:04 PM, Jack Krupansky <jack@basetechnology.com> wrote:
>>> 
>>>> There is no simple, obvious, and direct approach, right out of the box. Sure,
you can highlight passages of raw text, right out of the box, but that won't give you chapters,
pages, and line numbers. To do all of that, you would have to either:
>>>> 
>>>> 1. Add chapter, page, and line number as part of the payload for each word.
And add some custom document transformers to access the information.
>>>> or
>>>> 2. Index each line as a separate Solr document, with fields for book, chapter,
page, and line number.
>>>> 
>>>> -- Jack Krupansky
>>>> 
>>>> -----Original Message----- From: Jason Funk
>>>> Sent: Tuesday, April 23, 2013 5:02 PM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Book text with chapter line number
>>>> 
>>>> Hello.
>>>> 
>>>> I'm trying to figure out if Solr is going to work for a new project that
I am wanting to build. At it's heart it's a book text searching application. Each book is
broken into chapters and each chapter is broken into lines. I want to be able to search these
books and return relevant sections of the book and display the results with chapter and line
number. I'm not sure how I would structure my data so that it's efficient and functional.
I could simply treat each line of text as a document which would provide some of the functionality
but what if the search query spanned two lines? Then it seems the passage the user was searching
for wouldn't be returned. I could treat each book as a document and use highlighting to find
the context but that seems to limit weighting/results for best matches as well as difficultly
in finding chapter/line numbers. What is the best way to do this with Solr?
>>>> 
>>>> Is there a better tool to use to solve my problem?
>>> 
>> 
>> --
>> Walter Underwood
>> wunder@wunderwood.org
>> 
>> 
>> 


Mime
View raw message