lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brad Harper <>
Subject Investigating Lucene's Applicability to [Unusual?] Use Case
Date Wed, 13 Jun 2007 19:39:02 GMT


[This is not an intentional cross-posting. I posted this same question to
the 'general' lucene list; replies there suggested that I'd have
better/quicker responses using this list instead.]

I'm investigating Lucene as a replacement for a special-purpose search
technology that was developed long before Lucene (or any of the current IR
libraries) became available. 

The use case involves so-called print streams. Imagine 20,000 statements
concatenated into one large file suitable for delivery to a print system.
The document formats vary, but include AFP (an IBM printer format), PCL (an
HP format), Postscript, PDF, and even "plain-text". 

The indexing application must track the total page count of the embedded
statements. On a hit, the search application must extract and return the
[possibly multi-page] statement embedded within the larger print-stream

How would the search application know (be informed by the Lucene/indexer)
the extent of the internal document(s)? 

I'm not seeing this scenario discussed in forums or books. Does anyone have
comments or thoughts on Lucene's applicability as a solution? 


View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message