lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noble Paul നോബിള്‍ नोब्ळ् <noble.p...@corp.aol.com>
Subject Re: Fastest way to use solrj
Date Wed, 27 Jan 2010 11:30:17 GMT
The binary format just reduces overhead. in your case , all the data
is in the big text field which is not compressed. But overall, the
parsing is a lot faster for the binary format. So you see a perf boost

2010/1/27 Tim Terlegård <tim.terlegard@gmail.com>:
> I have 6 fields. The text field is the biggest, it contains almost all
> of the 5000 chars.
>
> /Tim
>
> 2010/1/27 Noble Paul നോബിള്‍  नोब्ळ् <noble.paul@corp.aol.com>:
>> how many fields are there in each doc? the binary format just reduces
>> overhead. it does not touch/compress the payload
>>
>> 2010/1/27 Tim Terlegård <tim.terlegard@gmail.com>:
>>> I have 3 millon documents, each having 5000 chars. The xml file is
>>> about 15GB. The binary file is also about 15GB.
>>>
>>> I was a bit surprised about this. It doesn't bother me much though. At
>>> least it performs better.
>>>
>>> /Tim
>>>
>>> 2010/1/27 Noble Paul നോബിള്‍  नोब्ळ् <noble.paul@corp.aol.com>:
>>>> if you write only a few docs you may not observe much difference in
>>>> size. if you write large no:of docs you may observe a big difference.
>>>>
>>>> 2010/1/27 Tim Terlegård <tim.terlegard@gmail.com>:
>>>>> I got the binary format to work perfectly now. Performance is better
>>>>> than with xml. Thanks!
>>>>>
>>>>> Although, it doesn't look like a binary file is smaller in size than
>>>>> an xml file?
>>>>>
>>>>> /Tim
>>>>>
>>>>> 2010/1/27 Noble Paul നോബിള്‍  नोब्ळ् <noble.paul@corp.aol.com>:
>>>>>> 2010/1/21 Tim Terlegård <tim.terlegard@gmail.com>:
>>>>>>> Yes, it worked! Thank you very much. But do I need to use curl
or can
>>>>>>> I use CommonsHttpSolrServer or StreamingUpdateSolrServer? If
I can't
>>>>>>> use BinaryWriter then I don't know how to do this.
>>>>>> if your data is serialized using JavaBinUpdateRequestCodec, you may
>>>>>> POST it using curl.
>>>>>> If you are writing directly , use CommonsHttpSolrServer
>>>>>>>
>>>>>>> /Tim
>>>>>>>
>>>>>>> 2010/1/20 Noble Paul നോബിള്‍  नोब्ळ्
<noble.paul@corp.aol.com>:
>>>>>>>> 2010/1/20 Tim Terlegård <tim.terlegard@gmail.com>:
>>>>>>>>>>>> BinaryRequestWriter does not read from a
file and post it
>>>>>>>>>>>
>>>>>>>>>>> Is there any other way or is this use case not
supported? I tried this:
>>>>>>>>>>>
>>>>>>>>>>> $ curl <host>/solr/update/javabin -F stream.file=/tmp/data.bin
>>>>>>>>>>> $ curl <host>/solr/update -F stream.body='
<commit />'
>>>>>>>>>>>
>>>>>>>>>>> Solr did read the file, because solr complained
when the file wasn't
>>>>>>>>>>> in the format the JavaBinUpdateRequestCodec expected.
But no data is
>>>>>>>>>>> added to the index for some reason.
>>>>>>>>>
>>>>>>>>>> how did you create the file /tmp/data.bin ? what
is the format?
>>>>>>>>>
>>>>>>>>> I wrote this in the first email. It's in the javabin
format (I think).
>>>>>>>>> I did like this (groovy code):
>>>>>>>>>
>>>>>>>>>   fieldId = new NamedList()
>>>>>>>>>   fieldId.add("name", "id")
>>>>>>>>>   fieldId.add("val", "9-0")
>>>>>>>>>   fieldId.add("boost", null)
>>>>>>>>>   fieldText = new NamedList()
>>>>>>>>>   fieldText.add("name", "text")
>>>>>>>>>   fieldText.add("val", "Some text")
>>>>>>>>>   fieldText.add("boost", null)
>>>>>>>>>   fieldNull = new NamedList()
>>>>>>>>>   fieldNull.add("boost", null)
>>>>>>>>>   doc = [fieldNull, fieldId, fieldText]
>>>>>>>>>   docs = [doc]
>>>>>>>>>   root = new NamedList()
>>>>>>>>>   root.add("docs", docs)
>>>>>>>>>   fos = new FileOutputStream("data.bin")
>>>>>>>>>   new JavaBinCodec().marshal(root, fos)
>>>>>>>>>
>>>>>>>>> /Tim
>>>>>>>>>
>>>>>>>> JavaBin is a format.
>>>>>>>> use this method JavaBinUpdateRequestCodec# marshal(UpdateRequest
>>>>>>>> updateRequest, OutputStream os)
>>>>>>>>
>>>>>>>> The output of this can be posted to solr and it should work
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> -----------------------------------------------------
>>>>>>>> Noble Paul | Systems Architect| AOL | http://aol.com
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> -----------------------------------------------------
>>>>>> Noble Paul | Systems Architect| AOL | http://aol.com
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> -----------------------------------------------------
>>>> Noble Paul | Systems Architect| AOL | http://aol.com
>>>>
>>>
>>
>>
>>
>> --
>> -----------------------------------------------------
>> Noble Paul | Systems Architect| AOL | http://aol.com
>>
>



-- 
-----------------------------------------------------
Noble Paul | Systems Architect| AOL | http://aol.com

Mime
View raw message