Page 1 of 3

ACR Lag / CPU use creeping up?

Posted: Thu Sep 13, 2007 10:06 pm
by AcadiusLost
One thing I've noticed hosting the OAS2- it starts off at a pretty low idle CPU usage, but every now and then I'll look over and find it running near 100% by the task manager. Without a script profiler I have a hard time working out exactly what the problem is, but I've been restarting the module several times a day to keep the CPU from being limiting. Once it climbs way up, it seems to stay up unless you restart the module.

This is on an AMD Athlon XP 2800+ single-core processor.

Indio, do you see this with TSF? It may be an ACR quirk, it may be something more OAS2 specific (morale scripts, mob flocking, etc).

Has anyone played with the NWNx2 script profiler plugin, to see if it functions under NWNx4?

Posted: Thu Sep 13, 2007 10:32 pm
by indio
I've found 003 to remain low pretty much constantly.

The spec:
Core 2 Duo 4300 @ 2.4
2 GB RAM

For example, the server has been up for 115 hours currently, and the CPU is idling at 0%. With the old quest system in place it would idle (with someone logged on) at 40% or so.

Flocking is an intensive CPU activity, especially with the pathfinding as it is in NWN2 currently (although it's going to be improving). Other things that will keep the CPU high are players using WASD to navigate, NPC spellcasting and something else I can't remember.

But generally I've found, over almost 6 months, that CPU usage with NWNX4 and the server running, even with multiple players, to remain very low indeed.

Posted: Thu Sep 13, 2007 10:35 pm
by AcadiusLost
I may have a look at how the OAS2's spawn points are set up, generally ALFA spawns shouldn't be active and burning up CPU cycles if there is no one logged into the module, that part's all built-in. Maybe something is keeping them from de-spawning properly.

Glad to hear the "100% with 0/6 players on" phenomenon isn't widespread.

Posted: Thu Sep 13, 2007 11:14 pm
by indio
I don't know what the minimum Ghtz a CPU needs to be for a server to not see 100% usage, but even with multiple players and literally 50+ mobs spawned, the CPU remains below 5%, with heartbeat spikes to about 30%. So even a lower specced CPU than the one on 003 shouldn't be seeing over 50% even with players and mobs, unless that threshold is just above your current CPU (which I frankly don't believe).

It's odd.

By the way, mobs always despawn within a minute or two of a PC leaving an area on 003 reliably. So a despawning problem would be unusual. It's unlikely to be a heartbeat script, but I've not seen the mod.

One thing that used to screw up a few things was if put the wrong scripts in the OnEnter and OnClientEnter handles. Unlikely, but possible.

Posted: Fri Sep 14, 2007 1:10 am
by ç i p h é r
If the profiler doesn't work (very doubtful that it will), check out the timer functions in the NWNx4 timer plugin.

Posted: Fri Sep 14, 2007 3:35 am
by AcadiusLost
Seems pretty reproducible for me, back up to 100% solid on the OAS2 after 4-5 hours of uptime, no PCs or DMs had logged on since it was reset, no SQL errors or DB connection problems, SQL logs show pretty much just the periodic update of server time to the DB.

Not familiar with the timer plugin, how would I use it to profile lag?

After restarting, it takes about 1 min 45 sec to load the directory, and then drops to essentially 0% CPU use. Over time it will develop heartbeat-style spikes that get larger and larger until they max it out- definitely seems like a "leaky script" building up CPU overhead. If it's not showing on TSF, hopefully that helps narrow it down to non-ACR systems. Might be worth trying a hosting of a hak-containing ACR module on my setup to confirm this though.

Posted: Fri Sep 14, 2007 4:07 am
by indio
Send us a copy of the OAS and I'll see if it does the same on my server if you like.

Posted: Fri Sep 14, 2007 7:09 am
by AcadiusLost
Just to further characterize the behaviour:

At 1hr30min after module load, baseline usage was ~4%, with spikes to 25-30% every heartbeat (or therabouts)

At 3hr30min after module load, baseline was more like 11%, with spikes breaking 90%. Actually, by this point the spikes are more like the norm, with transitory drops to the 11%-ish level regularly.

I'd be happy to 7zip the module up and put it on your FTP for testing, I'll just post to get Teric's OK on it. In the meantime, I'll look for likely culprits in the toolset.

Posted: Fri Sep 14, 2007 8:22 am
by indio
Well, here's an update.

I'd assumed my performance was unchanged from pre-ACR ays, but I hadn't stopped to really look at it.

After 12 hours it's doing a pyramid-like round, starting at 2%, then to 4 on the next second, then 8, 22, 28, 25, and then back down to 2 again.

Now t's a long way from uniform, but 3 beats are below 10, and 3 above 20. This is certainly different from pre-ACR.

Posted: Fri Sep 14, 2007 8:27 am
by Teric neDhalir
AcadiusLost wrote: I'd be happy to 7zip the module up and put it on your FTP for testing, I'll just post to get Teric's OK on it. In the meantime, I'll look for likely culprits in the toolset.
Yeah, no worries, go ahead. Only thing I can think of are the locals in Soubar who walk about using a non-Bioware walk waypoints script. But if no-ones's logged on why would they have started?
TnD

Posted: Wed Sep 19, 2007 7:00 pm
by AcadiusLost
Sorry I haven't uploaded that module yet, seems my password for your FTP isn't good anymore, Indio. Is the Tech FTP allright, or should I use the DM FTP? Not sure what you have access to currently.

I've been trying to narrow down the cause (unsuccessfully) - it's s difficult since it's really only crippling after 3.5+ hours of uptime.

so far I've more or less ruled out:

NPC pathfinding via default walkwaypoints (killed the ones in Soubar, no effect- the fugue ones are small by comparison.)

Morale system. (tried commenting out the entire "meat" of this in case the CountNumberofAlliesAndEnemies() or something was wearing it down.)


Next I'll try disabling the flocking scripts to see if that helps.

Could module corruption of some sort cause the creeping CPU burden?

Posted: Thu Sep 20, 2007 12:01 am
by ç i p h é r
Perhaps there's a DelayComand() that's stacking up.

I haven't looked at the timer functions myself, but the gist of it was a call to a "start" function at the top of the script followed by a call to a "stop" function at the end. I'm sure the scripts themselves are intuitive but you'll need to import the MySQL Time include script to see. There's also a demo module you can peek at for usage examples.

Sorry I can't be more help.

Posted: Thu Sep 20, 2007 6:10 pm
by AcadiusLost
Oddly enough, at 14.5 hours of uptime, the CPU use of the OAS2 is back down to nearly nothing again. There may be a cycle to it that I've not been noticing (either that, or I finally hit on the lagging scripts, though pretty sure CPU use was climbing when I went to bed last night).

Posted: Mon Sep 24, 2007 6:31 pm
by AcadiusLost
These tests are on indefinite hold: my test rig had a meltdown - not sure if the heavy and continuous CPU utilization was a factor or not- it had run like a champ for years until the last few days. Hoping it's just the PSU and not the mobo/CPU that's fried, we'll see by next weekend perhaps.

Posted: Wed Sep 26, 2007 6:45 pm
by AcadiusLost
Some headway on my server repair, but it's still nonfunctional. Replacement of the power supply left it without functional video somehow. (no signal from either of 2 working video cards) - may have to hunt down an ancient pre-AGP video card next.

On a brighter note, looks like someone has released a limited profiler plugin for NWNx4 - definitely worth a look.

http://www.nwnx.org/phpbbforum/viewtopic.php?t=884

I'd like to see what this has to say about the OAS2 for sure, would likely be informative for TSF as well at this point.

Anyone up to try it out?