lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruno Mannina <bmann...@free.fr>
Subject Re: Possible or not ?
Date Fri, 05 Jun 2015 16:40:46 GMT
Ok thanks for these information !

Le 05/06/2015 17:37, Erick Erickson a écrit :
> Picking up on Alessandro's point. While you can post all these docs
> and commit at the end, unless you do a hard commit (
> openSearcher=true or false doesn't matter), then if your server should
> abnormally terminate for _any_ reason, all these docs will be
> replayed on startup from the transaction log.
>
> I'll also echo Alessandro's point that I don't see the advantage of this.
> Personally I'd set my hard commit interval with openSearcher=false
> to something like 60000 (60 seconds it's in milliseconds) and forget
> about it. You're not imposing  much extra load on the system, you're
> durably saving your progress, you're avoiding really, really, really
> long restarts if your server should stop for some reason.
>
> If you don't want the docs to be _visible_ for searches, be sure your
> autocommit has openSearcer set to false and disable soft commits
> (set the interval to -1 or remove it from your solrconfig).
>
> Best,
> Erick
>
> On Fri, Jun 5, 2015 at 8:21 AM, Alessandro Benedetti
> <benedetti.alex85@gmail.com> wrote:
>> I can not see any problem in that, but talking about commits I would like
>> to make a difference between "Hard" and "Soft" .
>>
>> Hard commit -> durability
>> Soft commit -> visibility
>>
>> I suggest you this interesting reading :
>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>> It's an old interesting Erick post.
>>
>> It explains you better what are the differences between different commit
>> types.
>>
>> I would put you in this scenario :
>>
>> Heavy (bulk) indexing
>>> The assumption here is that you’re interested in getting lots of data to
>>> the index as quickly as possible for search sometime in the future. I’m
>>> thinking original loads of a data source etc.
>>>
>>>     - Set your soft commit interval quite long. As in 10 minutes or even
>>>     longer (-1 for no soft commits at all). *Soft commit is about
>>>     visibility, *and my assumption here is that bulk indexing isn’t about
>>>     near real time searching so don’t do the extra work of opening any kind
of
>>>     searcher.
>>>     - Set your hard commit intervals to 15 seconds, openSearcher=false.
>>>     Again the assumption is that you’re going to be just blasting data at Solr.
>>>     The worst case here is that you restart your system and have to replay 15
>>>     seconds or so of data from your tlog. If your system is bouncing up and
>>>     down more often than that, fix the reason for that first.
>>>     - Only after you’ve tried the simple things should you consider
>>>     refinements, they’re usually only required in unusual circumstances. But
>>>     they include:
>>>        - Turning off the tlog completely for the bulk-load operation
>>>        - Indexing offline with some kind of map-reduce process
>>>        - Only having a leader per shard, no replicas for the load, then
>>>        turning on replicas later and letting them do old-style replication to
>>>        catch up. Note that this is automatic, if the node discovers it is “too
>>>        far” out of sync with the leader, it initiates an old-style replication.
>>>        After it has caught up, it’ll get documents as they’re indexed to
the
>>>        leader and keep its own tlog.
>>>        - etc.
>>>
>>>
>> Actually you could do the commit only at the end, but I can not see any
>> advantage in that.
>> I suggest you to play with auto hard/soft commit config and get a better
>> idea of the situation !
>>
>> Cheers
>>
>> 2015-06-05 16:08 GMT+01:00 Bruno Mannina <bmannina@free.fr>:
>>
>>> Hi Alessandro,
>>>
>>> I'm actually on my dev' computer, so I would like to post 1 000 000 xml
>>> file (with a structure defined in my schema.xml)
>>>
>>> I have already import 1 000 000 xml files by using
>>> bin/post -c mydb /DATA0/1 /DATA0/2 /DATA0/3 /DATA0/4 /DATA0/5
>>> where /DATA0/X contains 20 000 xml files (I do it 20 times by just
>>> changing X from 1 to 50)
>>>
>>> I would like to do now
>>> bin/post -c mydb /DATA1
>>>
>>> I would like to know If my SOLR5 will run fine and no provide an memory
>>> error because there are too many files
>>> in one post without doing a commit?
>>>
>>> The commit will be done at the end of 1 000 000.
>>>
>>> Is it ok ?
>>>
>>>
>>>
>>> Le 05/06/2015 16:59, Alessandro Benedetti a écrit :
>>>
>>>> Hi Bruno,
>>>> I can not see what is your challenge.
>>>> Of course you can index your data in the flavour you want and do a commit
>>>> whenever you want…
>>>> Are those xml Solr xml ?
>>>> If not you would need to use the DIH, the extract update handler or any
>>>> custom Indexer application.
>>>> Maybe I missed your point…
>>>> Give me more details please !
>>>>
>>>> Cheers
>>>>
>>>> 2015-06-05 15:41 GMT+01:00 Bruno Mannina <bmannina@free.fr>:
>>>>
>>>>   Dear Solr Users,
>>>>> I would like to post  1 000 000 records (1 records = 1 files) in one
>>>>> shoot
>>>>> ?
>>>>> and do the commit and the end.
>>>>>
>>>>> Is it possible to do that ?
>>>>>
>>>>> I've several directories with each 20 000 files inside.
>>>>> I would like to do:
>>>>> bin/post -c mydb /DATA
>>>>>
>>>>> under DATA I have
>>>>> /DATA/1/*.xml (20 000 files)
>>>>> /DATA/2/*.xml (20 000 files)
>>>>> /DATA/3/*.xml (20 000 files)
>>>>> ....
>>>>> /DATA/50/*.xml (20 000 files)
>>>>>
>>>>> Actually, I post 5 directories in one time (it takes around 1h30 for
100
>>>>> 000 records/files)
>>>>>
>>>>> But it's Friday and I would like to run it during the W.E. alone.
>>>>>
>>>>> Thanks for your comment,
>>>>>
>>>>> Bruno
>>>>>
>>>>> ---
>>>>> Ce courrier électronique ne contient aucun virus ou logiciel malveillant
>>>>> parce que la protection avast! Antivirus est active.
>>>>> https://www.avast.com/antivirus
>>>>>
>>>>>
>>>>>
>>> ---
>>> Ce courrier électronique ne contient aucun virus ou logiciel malveillant
>>> parce que la protection avast! Antivirus est active.
>>> https://www.avast.com/antivirus
>>>
>>>
>>
>> --
>> --------------------------
>>
>> Benedetti Alessandro
>> Visiting card : http://about.me/alessandro_benedetti
>>
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>>
>> William Blake - Songs of Experience -1794 England
>


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection
avast! Antivirus est active.
https://www.avast.com/antivirus


Mime
View raw message