Large module idle CPU use, ACR hearbeat concerns
Moderators: ALFA Administrators, Staff - Technical
- AcadiusLost
- Chosen of Forumamus, God of Forums
- Posts: 5061
- Joined: Tue Oct 19, 2004 8:38 am
- Location: Montara, CA [GMT -8]
- Contact:
Large module idle CPU use, ACR hearbeat concerns
This one is primarily directed to Cipher, but others are welcome to share thoughts- currently we instruct builders to include the acf_* handler functions in each of the events of the module (areas, placeables, triggers, doors, creatures, and the module itself). These include acf_*_onheartbeat scripts, which call their respective functions in the acr includes, to give us hooks into those events for core scripting.
So far, I haven't seen much done with the OnHeartbeat functions in the ACR- the only system that consistently hooks into them is the VFX system, which checks a few local strings, then tries to apply an effect based on them (which will do nothing if the strings aren't set). According to the NWNx4 script profiler, these heartbeat scripts don't use any significant CPU time individually. However, for something like the placeables heartbeat, it's potentially firing once per six seconds for every placeable in every area of the module. The script profiler was registering that the acf_plc_onheartbeat was being called very often- for example, a 2-hour run of the 003 module with the Silverymoon university attached generated 254,818 calls to acf_plc_onheartbeat.
While players are on, these heartbeats seem not to cause problems- but for whatever reason, while the server is idle/empty, they seem to eat up more and more CPU utilization over time- in the worst cases (all complex interiors present for 003), it would reach 100% CPU use in only 20 minutes or so of idle time. A player logging in would bring it back down, but the climb would begin again when the player logged back out again.
On a hunch, I tried deleting the acf_plc_onheartbeat.ns entirely, and now the same module shows a much slower CPU climb, only becoming noteable after 10 hours or so of idle time- and even then not nearing the max-CPU behaviour present with the placeable heartbeats running.
Do we have a real need for any of the heartbeat events in the ACR, or can we safely remove it from most of them? I suspect the bulk of the remaining climb comes from area, trigger, and door heartbeats, which are always present even when players are not logged in. It seems like, for whatever reason, when no one is on, running even simple local variable checks on these events is responsible for the CPU climb which, as a host, makes me hesitant to leave the server unattended for extended periods, lest it wear on the host machine's CPU unnecessarily.
I suppose we can also see if just removing the VFX system solves the problem, as it may be something specific to that (reading local strings, or trying to apply a null effect) causing the difficulty. Thoughts?
So far, I haven't seen much done with the OnHeartbeat functions in the ACR- the only system that consistently hooks into them is the VFX system, which checks a few local strings, then tries to apply an effect based on them (which will do nothing if the strings aren't set). According to the NWNx4 script profiler, these heartbeat scripts don't use any significant CPU time individually. However, for something like the placeables heartbeat, it's potentially firing once per six seconds for every placeable in every area of the module. The script profiler was registering that the acf_plc_onheartbeat was being called very often- for example, a 2-hour run of the 003 module with the Silverymoon university attached generated 254,818 calls to acf_plc_onheartbeat.
While players are on, these heartbeats seem not to cause problems- but for whatever reason, while the server is idle/empty, they seem to eat up more and more CPU utilization over time- in the worst cases (all complex interiors present for 003), it would reach 100% CPU use in only 20 minutes or so of idle time. A player logging in would bring it back down, but the climb would begin again when the player logged back out again.
On a hunch, I tried deleting the acf_plc_onheartbeat.ns entirely, and now the same module shows a much slower CPU climb, only becoming noteable after 10 hours or so of idle time- and even then not nearing the max-CPU behaviour present with the placeable heartbeats running.
Do we have a real need for any of the heartbeat events in the ACR, or can we safely remove it from most of them? I suspect the bulk of the remaining climb comes from area, trigger, and door heartbeats, which are always present even when players are not logged in. It seems like, for whatever reason, when no one is on, running even simple local variable checks on these events is responsible for the CPU climb which, as a host, makes me hesitant to leave the server unattended for extended periods, lest it wear on the host machine's CPU unnecessarily.
I suppose we can also see if just removing the VFX system solves the problem, as it may be something specific to that (reading local strings, or trying to apply a null effect) causing the difficulty. Thoughts?
- AcadiusLost
- Chosen of Forumamus, God of Forums
- Posts: 5061
- Joined: Tue Oct 19, 2004 8:38 am
- Location: Montara, CA [GMT -8]
- Contact:
Certainly, some of the ACR functions on placeables are essential, for things like persistent storage and the death system. We can re-evaluate the need to have the handlers on all useable placeable objects, but they're certainly necessary on some- and having them on all opens up possibilities for a number of other systems to be added globally down the line.indio wrote:I'd go one step further and remove all scripts from all placeables.
- AcadiusLost
- Chosen of Forumamus, God of Forums
- Posts: 5061
- Joined: Tue Oct 19, 2004 8:38 am
- Location: Montara, CA [GMT -8]
- Contact:
More testing on this indicates that it's the existence of a script, any script, in the placeable onheartbeat, that is causing the majority of the CPU climb. When I add acf_plc_onheartbeat back to the module as an empty script (ACR function call commented out, no code of any kind between the { } ), we still see a rapid CPU climb while idle. Seems like something wrong on a fundamental level with nwn2server.exe when idle (no players on). The good news is, we can temporarily work around the bug by deleting acf_plc_onheartbeat, and the only functionality lost is on the VFX system, which AFAIK isn't being used for anything yet.
I'm hosting a version that's got acf_trg_onheartbeat.nss (calls only VFX heartbeats) and acf_are_onheartbeat.nss (does nothing) also deleted- we'll see if idle performance is yet better than we saw from the acf_plc_onheartbeat.nss deletion.
I'm hosting a version that's got acf_trg_onheartbeat.nss (calls only VFX heartbeats) and acf_are_onheartbeat.nss (does nothing) also deleted- we'll see if idle performance is yet better than we saw from the acf_plc_onheartbeat.nss deletion.
Er, perhaps it might be a good idea to report this to Obsidian?
It seems suspiciously like a bug - I'm confused as to how the CPU usage would just steadily grow over time, unless more things were being calculated every heartbeat. You mentioned it happened in the case without scripts on heartbeats firing and just having one acted as a catalyst - I wonder if there's a memory leak going on at the same time along with this CPU leak?
On the other hand, perhaps it's possible this is intended? Maybe it's tying on_heartbeat to some sort of pre-caching stuff to prepare itself for when players are on-line or doing some sort of other trivial task while players aren't present which could eat CPU? I dunno.
It seems suspiciously like a bug - I'm confused as to how the CPU usage would just steadily grow over time, unless more things were being calculated every heartbeat. You mentioned it happened in the case without scripts on heartbeats firing and just having one acted as a catalyst - I wonder if there's a memory leak going on at the same time along with this CPU leak?
On the other hand, perhaps it's possible this is intended? Maybe it's tying on_heartbeat to some sort of pre-caching stuff to prepare itself for when players are on-line or doing some sort of other trivial task while players aren't present which could eat CPU? I dunno.

People talk of bestial cruelty, but that's a great injustice and insult to the beasts; a beast can never be so cruel as man, so artistically cruel.
- ç i p h é r
- Retired
- Posts: 2904
- Joined: Fri Oct 21, 2005 4:12 pm
- Location: US Central (GMT - 6)
Whoops. Looks like I missed this thread before.
The behavior does seem to suggest that the events may be stacking with every heartbeat, for whatever reason. How else could the load on the CPU increase over time? I wonder if this has something to do with the dynamic area loading/unloading (to conserve memory) that OE said they were looking into but didn't develop. To test the stacking theory, we could make the heartbeat script log something to the database, or speak a string which could be logged by the chat plugin. A properly functioning heartbeat should produce a 6 second period between each log entry.
Regardless, Al's discovery is new so we should make sure to report his findings. I think the PW round table is not the place to report bugs. It's just functioning as a gathering place for PWs to talk about and work through these kinds of problems collectively. There's a more specific method for reporting bugs, which we should pursue. I don't know it off hand but I know there is one.
Al, do you know if this behavior can be circumvented any other way besides having a player logged in? Maybe having static NPCs or creatures placed in the game? And does "player" include DM?
The behavior does seem to suggest that the events may be stacking with every heartbeat, for whatever reason. How else could the load on the CPU increase over time? I wonder if this has something to do with the dynamic area loading/unloading (to conserve memory) that OE said they were looking into but didn't develop. To test the stacking theory, we could make the heartbeat script log something to the database, or speak a string which could be logged by the chat plugin. A properly functioning heartbeat should produce a 6 second period between each log entry.
Regardless, Al's discovery is new so we should make sure to report his findings. I think the PW round table is not the place to report bugs. It's just functioning as a gathering place for PWs to talk about and work through these kinds of problems collectively. There's a more specific method for reporting bugs, which we should pursue. I don't know it off hand but I know there is one.
Al, do you know if this behavior can be circumvented any other way besides having a player logged in? Maybe having static NPCs or creatures placed in the game? And does "player" include DM?
- AcadiusLost
- Chosen of Forumamus, God of Forums
- Posts: 5061
- Joined: Tue Oct 19, 2004 8:38 am
- Location: Montara, CA [GMT -8]
- Contact:
The module behaves correctly when there is at least one player or at least one DM ingame- it's only when the server is at 0 clients logged in that we see the CPU use climbing.
The climb is worse the more high-placeable-count interiors are present- NWNx4 profiler shows that many, many calls to the acf_plc_onheartbeat are being made, but does not register an appreciable amount of CPU time being consumed by them.
Deleting the script that placeables are set to call onheartbeat works around this CPU climb almost completely- the climb takes many hours instead of just a few minutes. Doing the same with area heartbeats and trigger heatbeats improves the performance even more. Changing the called scripts to just {return;} does not resolve the bug, they must be deleted completely to circumvent it. (note- didn't try commenting out the #includes, just all the function calls).
This definitely looks like some kind of unhandled behavior specific to nwn2server when its empty- perhaps some kind of automatic cleanup or other optimization, most noticeable when it's trying to call many scripts simultaneously (in the case of placeable heartbeats). Alternatively, when one or more clients are logged in, some optimization is active that isn't present when the server is empty- though I tend to think it's a bug in a system, rather than an inherent problem that's somehow compensated for by an engine optimization when clients are connected.
The CPU climb itself is very distinctive- approximately 6-second interval peaks that grow over time- definitely seems like it builds on itself like constructive interference on a waveform- occasionally out of phase, but always with a similar wavelength (~6 seconds, like the server heartbeat).
We could make a report to DM channel to track with our chat plugin, but it would flood with information- just a few minutes on generates thousands of calls to acf_plc_onheartbeat, since it's being called once per placeable per heartbeat. We have a few placed instances of creatures (nonwandering) in the module, so we can say that just having a creature in the mod doesn't alleviate the problem. Most of our creatures are spawned in only when players are in the areas, and despawned shortly afterwards using our own custom ALFA spawn system.
Hope this clarifies the symptoms.
The climb is worse the more high-placeable-count interiors are present- NWNx4 profiler shows that many, many calls to the acf_plc_onheartbeat are being made, but does not register an appreciable amount of CPU time being consumed by them.
Deleting the script that placeables are set to call onheartbeat works around this CPU climb almost completely- the climb takes many hours instead of just a few minutes. Doing the same with area heartbeats and trigger heatbeats improves the performance even more. Changing the called scripts to just {return;} does not resolve the bug, they must be deleted completely to circumvent it. (note- didn't try commenting out the #includes, just all the function calls).
This definitely looks like some kind of unhandled behavior specific to nwn2server when its empty- perhaps some kind of automatic cleanup or other optimization, most noticeable when it's trying to call many scripts simultaneously (in the case of placeable heartbeats). Alternatively, when one or more clients are logged in, some optimization is active that isn't present when the server is empty- though I tend to think it's a bug in a system, rather than an inherent problem that's somehow compensated for by an engine optimization when clients are connected.
The CPU climb itself is very distinctive- approximately 6-second interval peaks that grow over time- definitely seems like it builds on itself like constructive interference on a waveform- occasionally out of phase, but always with a similar wavelength (~6 seconds, like the server heartbeat).
We could make a report to DM channel to track with our chat plugin, but it would flood with information- just a few minutes on generates thousands of calls to acf_plc_onheartbeat, since it's being called once per placeable per heartbeat. We have a few placed instances of creatures (nonwandering) in the module, so we can say that just having a creature in the mod doesn't alleviate the problem. Most of our creatures are spawned in only when players are in the areas, and despawned shortly afterwards using our own custom ALFA spawn system.
Hope this clarifies the symptoms.