[rancid] Re: High CPU Utilization on routers during Rancid capture

Justin Shore justin at justinshore.com
Mon Jan 28 00:31:35 UTC 2008


Frank,

No PPPoE here but you're thinking along the right track.  I have about 
1200 PVCs configured for RBE DSL termination on the 3660.  The best 
design I can think of would have been VTIs or some other template 
mechanism, one per speed package we offer.  Unfortunately this is what I 
inherited.  ADSL is being phased out and being replaced with FTTH and 
ADSL2+ on distributed IP DSLAMs instead of centralized routers in the 
core.  These routers will breathe easier when the DSL load is taken off 
of them.

Slightly off-topic but still related is a problem I first encountered a 
couple years ago.  RANCID can help alert you to a low memory problem if 
you know what signs to look for.  This same 3660 started generating 
RANCID diffs every day or two.  A PVC or 2 would disappear and then 
reappear the next time RANCID ran.  It was always there when I checked 
by hand (sh run int ATMa/b.xyz).  I figured it was a fluke, that perhaps 
RANCID couldn't handle configs this big.  I ignored the diffs for 
months, even setting up Outlook to mark diffs related to that router as 
read.  Over time the number of PVCs disappearing and reappearing grew 
larger, up to hundreds at a time.  The time between occurences also 
shortened until it happened on every RANCID run.  The router was running 
fine so we never gave it a second thought.  One day the router was 
reported as down in RANCID.  I checked and the router was still up. 
However I could not do a sh run; it just returned me to the command 
prompt.  I figured out then what was going on.  The router was running 
out of RAM.  I tried all sorts of methods of getting the config, dumping 
it to tftp, etc before our scheduled maintenance window (just in case). 
  Nothing worked.  About 4 hours before the window the router went 
offline.  Once onsite I consoled in and found that OSPF had died (not 
enough RAM).  I rebooted without writing (which I was sure would jack 
the config if I wrote it).  It came up and ran ok.  I diffed the current 
config against one a few months back and found I was missing about 12k 
lines of config.  Woo!  I spent the rest of the morning pasting in 
config from a RANCID diff over a year old (before the problem first 
showed up).  It worked but seriously screwed up our carrier system.  The 
field techs spent most of the day driving around and resetting cards 
manually.

I've since seen this exact problem come up twice now with 2 completely 
unrelated pieces of equipment.  Both had a memory leak.  I managed to 
reboot them without incident since I caught the problem so quickly.  So, 
to make a long story short, if you see anything like what I describe 
above DO NOT WRITE THE CONFIG and schedule a maintenance window for a 
reboot ASAP.  Learn from my mistake.

Justin

Frank Bulk - iNAME wrote:
> I'm guess you're terminating PPPoX on there: have you looked into the range
> command to slim down the config a bit?  Or is that not possible with your
> requirements?
> 
> Frank
> 
> -----Original Message-----
> From: rancid-discuss-bounces at shrubbery.net
> [mailto:rancid-discuss-bounces at shrubbery.net] On Behalf Of Justin Shore
> Sent: Sunday, January 27, 2008 3:11 PM
> To: shane Haslem
> Cc: rancid-discuss at shrubbery.net
> Subject: [rancid] Re: High CPU Utilization on routers during Rancid capture
> 
> Of course.  I have 2 3660s and one 7206 (G1) that spike at 100% every
> hour on the hour.  It's not RANCID's fault.  It happens anytime I do a
> sh run.  The 7206 has about 13k lines in its config.  One 3660 has just
> under 6k lines.  The other 3660 has over 17k config lines.  That 3660's
> load stays at 100% for well over a minute.  A high load is expected
> given the sheer size of the config.  SSH has a higher load than telnet
> of course but that's no reason to not use SSH.
> 
> Justin
> 
> shane Haslem wrote:
>> Hi all,
>> Can anyone advise if they have experienced high CPU Utilization on
>> routers during config capture, I am using SSH to login, would this be a
>> factor?
>> Regards
> 
> _______________________________________________
> Rancid-discuss mailing list
> Rancid-discuss at shrubbery.net
> http://www.shrubbery.net/mailman/listinfo.cgi/rancid-discuss
> 


More information about the Rancid-discuss mailing list