[rancid] Improving Rancid's processing speed when having 1k+ devices
scott.granados at gmail.com
Thu Jul 25 17:16:43 UTC 2019
I would also recommend running multiple rancid servers maybe scatter them geographically so it’s not a single machine pulling all the weight. Break the work loads up among them.
> On Jul 25, 2019, at 12:55 PM, john heasley <heas at shrubbery.net> wrote:
> Thu, Jul 25, 2019 at 02:29:37PM +0200, Florin Vlad Olariu:
>> Well, as per title, is there any way to improve rancid's speed with so many
>> devices? At the moment I set PAR_COUNT to 300, so it will connect in
>> parallel to 300 devices at a time, but the reality is that most time does
>> not seem to be taken by connecting and retrieving config but by what
>> happens next in the file processing and git-comitting.
>> To give you some stats, with current settings it takes around 9 minutes to
>> do 1200 devices. I have only 1 group with all devices under the same group.
>> Any trick you might have, please let me know!
> Typically, the network and, more so, the devices are the slow part. Some
> devices are much slower than others. more parallelism helps a lot - your
> high PAR_COUNT. other thoughts:
> - cvs is slow. use svn or git. svn is probably faster; but I have not
> benchmarked the two for the functions that rancid uses.
> - make sure that the rancid user is not process rlimited to less than ~605
> processes; or PAR_COUNT * 2 + 5 or so.
> - perl is a meory pig. if the host/vm has memory pressure, this would be
> something to address.
> - retrieving device output does not require much cpu, but process does use
> some - dont starve it
> - use rancid.conf:NOPIPE=YES; i think this is faster because perl is a pig.
> - if you only need configs, then reduce what is collected to just show version
> and show running. or have one hourly group that collects that, and a daily
> group that collects everything. less processing, and esp many fewer regexes.
> multiple groups might help, at least for the SCM part. split your one large
> group into a few. make sure to use a separate cron for each so that they run
> in parallel.
> I havent attempted to benchmark or optimize any parts for a while. There was
> a complaint about the start-up time for control_rancid, which seems to me to
> be inconsequential, but I do not know what the users were attempting to do
> with rancid that made this matter. There are other benefits to this, so I've
> started to re-write it; this is not ready yet.
> 9 minutes for 1200 devices seems reasonable to me. :)
> Rancid-discuss mailing list
> Rancid-discuss at shrubbery.net
More information about the Rancid-discuss