lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ramprakash Ramamoorthy <youngestachie...@gmail.com>
Subject Re: Strange behavior of term queries with StoredFields - 4.1
Date Wed, 13 Feb 2013 05:20:17 GMT
On Tue, Feb 12, 2013 at 9:17 PM, Ian Lea <ian.lea@gmail.com> wrote:

> I think you can store field "x" using byte[] as one Field and index it
> using String as another Field.  Or define your own FieldType and use
> the Field(String name, byte[] value, FieldType type) constructor.  Or
> is that where you're getting an Exception?
>


Yeah Ian, that is where I get this exception :
*java.lang.IllegalArgumentException:
Fields with BytesRef values cannot be indexed. *As you said, this is what I
am doing now,

*doc.add(new Field("published-store", b.getPublished().getBytes("UTF-8"),
stored));*
*doc.add(new Field("published", b.getPublished(), indexed));*


> How big is your index, or how small are your disks?
>

Well let me explain, we currently use Lucene 2.3 in production.  We happen
to zip the older indices. But when a search query comes that spans across
the zipped period as well, it took one hell of a time to unzip the indices
and read. Now with 4.1's store level compression, the unzipping overhead is
bypassed. And for our data, we found byte[] index occupies 20% less than
storing as a String.

Thanks Ian, will update here, in case I come across another interesting
solution.

>
>
> --
> Ian.
>
>
> On Tue, Feb 12, 2013 at 10:56 AM, Ramprakash Ramamoorthy
> <youngestachiever@gmail.com> wrote:
> > Ian and et al,
> >
> >        Just a doubt. Now that I have to index and store(disk space is a
> > constraint here). I have identified that storing as byte[] helps save
> some
> > disk. But it isn't possible to index a byte[], am getting an exception
> when
> > the field to be indexed is a byte[].
> >
> >        So how do I go about this?
> >
> >
> > On Mon, Feb 11, 2013 at 8:43 PM, Ian Lea <ian.lea@gmail.com> wrote:
> >
> >> Yes, that looks fine.  As far as I'm aware the compression is low
> >> level and transparent to user code.
> >>
> >>
> >> --
> >> Ian.
> >>
> >>
> >> On Mon, Feb 11, 2013 at 2:59 PM, Ramprakash Ramamoorthy
> >> <youngestachiever@gmail.com> wrote:
> >> > On Mon, Feb 11, 2013 at 7:10 PM, Ian Lea <ian.lea@gmail.com> wrote:
> >> >
> >> >> StoredField does indeed only store the field, not index it.
> >> >> MatchAllDocs will find it because, by definition, it matches all
> docs.
> >> >>  But other queries won't.
> >> >>
> >> >
> >> > That was pretty clear Ian. Thanks a lot.
> >> >
> >> >>
> >> >> Not sure what you mean when you say you are particular about stored
> >> >> fields.  If you need to get it back from the index, store it.  If you
> >> >> don't, don't.  Same for indexing - don't index fields you don't need
> >> >> for searching.
> >> >>
> >> >
> >> > All my fields are supposed to be searchable(indexed) and stored as
> well.I
> >> > was actually trying to leverage the new stored fields compression in
> 4.1.
> >> >  So when I say,
> >> >
> >> > IndexWriterConfig indexWriterConfig = new
> >> > IndexWriterConfig(Version.LUCENE_41, analyzer);
> >> > fieldType.setIndexed(true);
> >> > fieldType.setStored(true);
> >> > fieldType.setTokenized(false);
> >> > doc.add(new Field("published", b.getPublished(), fieldType));
> >> >
> >> > This means that my docs will be indexed and stored in the compressed
> >> > format? Hope I am right this time? Thanks Ian.
> >> >
> >> >>
> >> >>
> >> >> --
> >> >> Ian.
> >> >>
> >> >>
> >> >
> >> >
> >> >
> >> >>
> >> >> On Mon, Feb 11, 2013 at 12:53 PM, Ramprakash Ramamoorthy
> >> >> <youngestachiever@gmail.com> wrote:
> >> >> > Team,
> >> >> >
> >> >> >            I am facing a strange issue with term queries and stored
> >> >> fields.
> >> >> > Here is how I index and fetch the query results,
> >> >> >
> >> >> > Case 1 :
> >> >> >   doc.add(new StoredField("published", b.getPublished()));
> >> >> >   Query query = new MatchAllDocsQuery();
> >> >> >
> >> >> >   Results : No of hits : 8(Expected)
> >> >> >
> >> >> > Case 2 :
> >> >> >   doc.add(new StoredField("published", b.getPublished()));
> >> >> >   Query query = new TermQuery(new Term("published", "2012"));
> >> >> >
> >> >> >   Result : No of hits : 0 (Expected - 4)
> >> >> >
> >> >> >  Case 3 :
> >> >> >    doc.add(new Field("published", b.getPublished(), fieldType));
> >> >> >    Query query = new TermQuery(new Term("published", "2012"));
> >> >> >
> >> >> >     Result : No of hits : 4(Expected)
> >> >> >
> >> >> > Does StoredField means only store and no index? But in that case,
> how
> >> >> does
> >> >> > the match all docs query work? I am puzzled.
> >> >> >
> >> >> > I am particular about stored fields, because of the compressed
> size of
> >> >> the
> >> >> > index. How do I go about this? Or am I missing something that
is
> >> >> obviously
> >> >> > basic. Please help.
> >> >> >
> >> >> > --
> >> >> > With Thanks and Regards,
> >> >> > Ramprakash Ramamoorthy,
> >> >> > India.
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > With Thanks and Regards,
> >> > Ramprakash Ramamoorthy,
> >> > India
> >> > +91 9626975420
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> >
> > --
> > With Thanks and Regards,
> > Ramprakash Ramamoorthy,
> > India.
> > +91 9626975420
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
India.
+91 9626975420

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message