Large module idle CPU use, ACR hearbeat concerns

Scripted ALFA systems & related tech discussions (ACR)

Moderators: ALFA Administrators, Staff - Technical

Locked
User avatar
AcadiusLost
Chosen of Forumamus, God of Forums
Posts: 5061
Joined: Tue Oct 19, 2004 8:38 am
Location: Montara, CA [GMT -8]
Contact:

Large module idle CPU use, ACR hearbeat concerns

Post by AcadiusLost »

This one is primarily directed to Cipher, but others are welcome to share thoughts- currently we instruct builders to include the acf_* handler functions in each of the events of the module (areas, placeables, triggers, doors, creatures, and the module itself). These include acf_*_onheartbeat scripts, which call their respective functions in the acr includes, to give us hooks into those events for core scripting.

So far, I haven't seen much done with the OnHeartbeat functions in the ACR- the only system that consistently hooks into them is the VFX system, which checks a few local strings, then tries to apply an effect based on them (which will do nothing if the strings aren't set). According to the NWNx4 script profiler, these heartbeat scripts don't use any significant CPU time individually. However, for something like the placeables heartbeat, it's potentially firing once per six seconds for every placeable in every area of the module. The script profiler was registering that the acf_plc_onheartbeat was being called very often- for example, a 2-hour run of the 003 module with the Silverymoon university attached generated 254,818 calls to acf_plc_onheartbeat.

While players are on, these heartbeats seem not to cause problems- but for whatever reason, while the server is idle/empty, they seem to eat up more and more CPU utilization over time- in the worst cases (all complex interiors present for 003), it would reach 100% CPU use in only 20 minutes or so of idle time. A player logging in would bring it back down, but the climb would begin again when the player logged back out again.

On a hunch, I tried deleting the acf_plc_onheartbeat.ns entirely, and now the same module shows a much slower CPU climb, only becoming noteable after 10 hours or so of idle time- and even then not nearing the max-CPU behaviour present with the placeable heartbeats running.

Do we have a real need for any of the heartbeat events in the ACR, or can we safely remove it from most of them? I suspect the bulk of the remaining climb comes from area, trigger, and door heartbeats, which are always present even when players are not logged in. It seems like, for whatever reason, when no one is on, running even simple local variable checks on these events is responsible for the CPU climb which, as a host, makes me hesitant to leave the server unattended for extended periods, lest it wear on the host machine's CPU unnecessarily.

I suppose we can also see if just removing the VFX system solves the problem, as it may be something specific to that (reading local strings, or trying to apply a null effect) causing the difficulty. Thoughts?
User avatar
indio
Ancient Red Dragon
Posts: 2810
Joined: Sat Jan 03, 2004 10:40 am

Post by indio »

I'd go one step further and remove all scripts from all placeables.
Thangorn
Haste Bear
Posts: 2081
Joined: Fri Oct 01, 2004 1:00 pm
Location: Queenstown, New Zealand

Post by Thangorn »

as long as the heartbeats have nothing to do with spawning/despawning, rp xp and corpse removal.. it shouldnt be a problem..
On indefinite real life hiatus

[22:52] <Veilan> obviously something sinister must be afoot if a DM does not have his social security number in his avatar name!
User avatar
AcadiusLost
Chosen of Forumamus, God of Forums
Posts: 5061
Joined: Tue Oct 19, 2004 8:38 am
Location: Montara, CA [GMT -8]
Contact:

Post by AcadiusLost »

indio wrote:I'd go one step further and remove all scripts from all placeables.
Certainly, some of the ACR functions on placeables are essential, for things like persistent storage and the death system. We can re-evaluate the need to have the handlers on all useable placeable objects, but they're certainly necessary on some- and having them on all opens up possibilities for a number of other systems to be added globally down the line.
User avatar
indio
Ancient Red Dragon
Posts: 2810
Joined: Sat Jan 03, 2004 10:40 am

Post by indio »

Obviously any placeable that needs scripts is an exception, for sure.

But retrofitting placeables with scripts in NWN2 is relatively easy is all, and given the threat of lag significantly outweights the inconvenience of a retrofit, remove any threat of lag first is my mindset.
User avatar
AcadiusLost
Chosen of Forumamus, God of Forums
Posts: 5061
Joined: Tue Oct 19, 2004 8:38 am
Location: Montara, CA [GMT -8]
Contact:

Post by AcadiusLost »

More testing on this indicates that it's the existence of a script, any script, in the placeable onheartbeat, that is causing the majority of the CPU climb. When I add acf_plc_onheartbeat back to the module as an empty script (ACR function call commented out, no code of any kind between the { } ), we still see a rapid CPU climb while idle. Seems like something wrong on a fundamental level with nwn2server.exe when idle (no players on). The good news is, we can temporarily work around the bug by deleting acf_plc_onheartbeat, and the only functionality lost is on the VFX system, which AFAIK isn't being used for anything yet.

I'm hosting a version that's got acf_trg_onheartbeat.nss (calls only VFX heartbeats) and acf_are_onheartbeat.nss (does nothing) also deleted- we'll see if idle performance is yet better than we saw from the acf_plc_onheartbeat.nss deletion.
User avatar
indio
Ancient Red Dragon
Posts: 2810
Joined: Sat Jan 03, 2004 10:40 am

Post by indio »

Well done AL. Good find.
paazin
Fionn In Disguise
Posts: 3544
Joined: Thu Apr 15, 2004 1:07 am
Location: UTC +2
Contact:

Post by paazin »

Er, perhaps it might be a good idea to report this to Obsidian?


It seems suspiciously like a bug - I'm confused as to how the CPU usage would just steadily grow over time, unless more things were being calculated every heartbeat. You mentioned it happened in the case without scripts on heartbeats firing and just having one acted as a catalyst - I wonder if there's a memory leak going on at the same time along with this CPU leak?

On the other hand, perhaps it's possible this is intended? Maybe it's tying on_heartbeat to some sort of pre-caching stuff to prepare itself for when players are on-line or doing some sort of other trivial task while players aren't present which could eat CPU? I dunno. :?
People talk of bestial cruelty, but that's a great injustice and insult to the beasts; a beast can never be so cruel as man, so artistically cruel.
User avatar
Rusty
Retired
Posts: 2847
Joined: Mon Feb 21, 2005 10:36 pm
Location: London
Contact:

Post by Rusty »

We've raised with in the Obsidian PW Admin group, but the only suggestion so far is that it may be related to 'npcs and monsters AI and behaviour, expecially when there were some difficult pathfinding issues for wandering monsters.'
paazin
Fionn In Disguise
Posts: 3544
Joined: Thu Apr 15, 2004 1:07 am
Location: UTC +2
Contact:

Post by paazin »

:F maybe I should keep quiet then, as I'm not familiar with the history of what you guys have done.
People talk of bestial cruelty, but that's a great injustice and insult to the beasts; a beast can never be so cruel as man, so artistically cruel.
User avatar
Rusty
Retired
Posts: 2847
Joined: Mon Feb 21, 2005 10:36 pm
Location: London
Contact:

Post by Rusty »

That's it. No other response and none from Obsidian themselves. Wynna made the initial report.
User avatar
ç i p h é r
Retired
Posts: 2904
Joined: Fri Oct 21, 2005 4:12 pm
Location: US Central (GMT - 6)

Post by ç i p h é r »

Whoops. Looks like I missed this thread before.

The behavior does seem to suggest that the events may be stacking with every heartbeat, for whatever reason. How else could the load on the CPU increase over time? I wonder if this has something to do with the dynamic area loading/unloading (to conserve memory) that OE said they were looking into but didn't develop. To test the stacking theory, we could make the heartbeat script log something to the database, or speak a string which could be logged by the chat plugin. A properly functioning heartbeat should produce a 6 second period between each log entry.

Regardless, Al's discovery is new so we should make sure to report his findings. I think the PW round table is not the place to report bugs. It's just functioning as a gathering place for PWs to talk about and work through these kinds of problems collectively. There's a more specific method for reporting bugs, which we should pursue. I don't know it off hand but I know there is one.

Al, do you know if this behavior can be circumvented any other way besides having a player logged in? Maybe having static NPCs or creatures placed in the game? And does "player" include DM?
User avatar
AcadiusLost
Chosen of Forumamus, God of Forums
Posts: 5061
Joined: Tue Oct 19, 2004 8:38 am
Location: Montara, CA [GMT -8]
Contact:

Post by AcadiusLost »

The module behaves correctly when there is at least one player or at least one DM ingame- it's only when the server is at 0 clients logged in that we see the CPU use climbing.

The climb is worse the more high-placeable-count interiors are present- NWNx4 profiler shows that many, many calls to the acf_plc_onheartbeat are being made, but does not register an appreciable amount of CPU time being consumed by them.

Deleting the script that placeables are set to call onheartbeat works around this CPU climb almost completely- the climb takes many hours instead of just a few minutes. Doing the same with area heartbeats and trigger heatbeats improves the performance even more. Changing the called scripts to just {return;} does not resolve the bug, they must be deleted completely to circumvent it. (note- didn't try commenting out the #includes, just all the function calls).

This definitely looks like some kind of unhandled behavior specific to nwn2server when its empty- perhaps some kind of automatic cleanup or other optimization, most noticeable when it's trying to call many scripts simultaneously (in the case of placeable heartbeats). Alternatively, when one or more clients are logged in, some optimization is active that isn't present when the server is empty- though I tend to think it's a bug in a system, rather than an inherent problem that's somehow compensated for by an engine optimization when clients are connected.

The CPU climb itself is very distinctive- approximately 6-second interval peaks that grow over time- definitely seems like it builds on itself like constructive interference on a waveform- occasionally out of phase, but always with a similar wavelength (~6 seconds, like the server heartbeat).

We could make a report to DM channel to track with our chat plugin, but it would flood with information- just a few minutes on generates thousands of calls to acf_plc_onheartbeat, since it's being called once per placeable per heartbeat. We have a few placed instances of creatures (nonwandering) in the module, so we can say that just having a creature in the mod doesn't alleviate the problem. Most of our creatures are spawned in only when players are in the areas, and despawned shortly afterwards using our own custom ALFA spawn system.

Hope this clarifies the symptoms.
Locked