lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avi Steiner <astei...@varonis.com>
Subject RE: Positions files analysis
Date Tue, 28 Jun 2016 08:40:34 GMT
Thanks Eric.
I don't want to disable the phrase searches option.
I just wonder if there is any way I can find terms within index, and thought the pos file
analysis may be a direction.
I suspect that our index is full of long float numbers (i.e: 1234.4546786585899544) which
may be unnecessary.  Before I make any changes in our index process (like drop such terms),
I want to prove my suspicion.
I can make a search using regex in order to find how many _documents_ contains those terms,
but I would like to know how many such _terms_ (unique or total) are indexed. Is there a way
to do it? Maybe with luke?


-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: Tuesday, June 28, 2016 8:27 AM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Positions files analysis

Positions are necessary if you need to do "phrase searches".
If that's not necessary, simply turn that option off in your schema for the fields where it's
unnecessary. See the reference guide for termVectors termPositions termOffsets

I'm really not sure what you're asking by:
"Is there a way I can read/analyze index files as .pos?"

The various file extensions are a result of the options you define on your fields, that's
just the way Lucene works...

Best,
Erick

On Mon, Jun 27, 2016 at 7:25 AM, asteiner <asteiner@varonis.com> wrote:
> Hi
>
> I have a very large index and I'd like to see how can I reduce it.
> Some of the largest files in the index are the .pos files (positions).
> There are many excel files indexed with formulas, so I suspect that a
> large part of the index is used by junk terms as very long numbers.
> Is there a way I can read/analyze index files as .pos?
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Positions-files-analysis-tp4284485.
> html Sent from the Solr - User mailing list archive at Nabble.com.
________________________________
This email and any attachments thereto may contain private, confidential, and privileged material
for the sole use of the intended recipient. Any review, copying, or distribution of this email
(or any attachments thereto) by others is strictly prohibited. If you are not the intended
recipient, please contact the sender immediately and permanently delete the original and any
copies of this email and any attachments thereto.
Mime
View raw message