lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Ferreira <>
Subject Indexing 100Gb of readonly numeric data
Date Wed, 15 Feb 2012 18:04:40 GMT
Hi guys,

I hope I'm sending this to the right place.

I have this possible idea in mind (still fuzzy, but enough to describe
this), and I was wondering if Lucene or Solr could help in this. I've
implemented a Lucene index on custom enterprise data before and have
it running on Azure as well, so I know the basics of it.

For this idea, this are the premises:

- about 100Gb of data
- data is expected to be in one gigantic table. conceptually, is like
a spreadsheet table: rows are objects and columns are properties.
- values are mostly floating point numbers, and I expect them to be,
let's say, unique, or almost randomly distributed (1.89868776E+50,
- The data is readonly. it will never change.

Now I need to query this data based mostly in range queries on the
columns. Something like:

"SELECT * FROM Table WHERE (Col1 > 1.2E2 AND Col1 < 1.8E2) OR (Col3 == 0)"

which is basically "give me all the rows that satisfy this criteria".

I believe this could be easily done with a standard RDBMS, but I would
like to avoid that route.

So, is this someething doable with Lucene or Solr? And if so, how much
can be done with a stock, out of the box Lucene implementation?

While thinking about this, and assuming this could work well with
Lucene, I had 2 major questions:

- Won't I get an index that will be pretty much the same size of the
data source? I would have to index all columns from all rows, and as
there is not much "repetition" in the data source, wouldn't the index
almost mirror the data source?.

- If the data source is readonly, should I be creating the index once,
offline, and the replicate it to the search servers?

Or am I just being crazy and making a monster of a small problem? :)

Pedro Ferreira

mobile: 00 44 7712 557303
skype: pedrosilvaferreira

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message