View Issue Details
|ID||Project||Category||Date Submitted||Last Update|
|0019744||AI War 2||[All Projects] Note To Test||Jul 11, 2018 1:29 pm||Aug 29, 2018 7:55 pm|
|Product Version||0.749 Release The Hounds|
|Fixed in Version|
|Summary||0019744: Poor-performing savegame.|
|Description||Note: to test JUST the sim, stay on the galaxy map and unpause the game. You can then see what the sim performance is like with the vis layer removed. When you flip into the planet, then you have the sim and vis layers both affecting game speed.|
Key things to watch are what percentage the number in red down at the bottom shows (it should not be there, aka be at 100%). For me at the moment it's at 16% or so.
The other key thing to watch is what your overall processor load is like; if it's at 100%, then at least you're using all you have. But in my case, I'm rarely above 50% on a hyperthreaded 4-core processor (so 8 logical cores).
In an ideal world, the threads would flip over as quickly as they possibly can, with no timing delays caused by simply having multiple threads. This is not happening right now, I'm pretty sure.
Also in that ideal world, the threads would spread out and make use of all your processors as much as possible. This is also not happening right now, for sure.
And lastly, in that ideal world each thread would run as quickly as possible. That's mostly out of the scope of this issue report, but basically just speaks to us optimizing each thread's code as much as possible using profilers like ANTS and similar to help us identify problems using a "unit tester" program that we have. If the other problems get fixed up, then whatever the slowest thread is here will be the limiting factor, and that's what we want to be the case.
|Tags||No tags attached.|
Simply unzip the attached file into your PlayerData/Save folder in order to be able to access it. It should wind up having a structure like this: PlayerData/Save/Performance/PerformanceTest2.save
Performance.zip (419,141 bytes)
* Our newly open-sourced ArcenThread class has been improved by BearPerson in a variety of ways:
** ManualResetEventSlim is now used instead of bool for FromCallerToThread_YouShouldStart, and its Wait() and Reset() are now used instead of the dreaded Thread.Sleep().
** FromThreadToCaller_IHaveFinished is now a volatile bool.
** FromThreadToCaller_IHaveStarted is removed as unneeded.
** A clever new ThreadFriendlyLogging class has been added and is now used for _reliable_ timing dump information, which revealed a lot about our timings that we were not seeing in profilers properly.
* Now clearing the game-speed tracker (bottom left under the time counter) when switching from into or out of the galaxy map view, to keep that more accurate as you switch between lower-load and higher-load screens.
Summary, testing with a framelock of 60fps, which pretty reliably gets stuck at 25% on my machine
- A sim frame typically completes 50ms after the previous, on the dot
- Windows thread wakeup times are fairly respectable - 10 microseconds on the low end, 500 microseconds on the high end. Far better than sleep-polling.
(Though beware that initially starting a thread may take 1-3 milliseconds - waking up an existing one is a lot better)
- A sim frame spans three invocations of DoActualSimStep(), each spaced 17ms apart
(I may be jumping into this at an odd point, apologies for potential confusion)
- The first DASS() invocation passes through the CurrentFrameNumber++ path and kicks off an executionContext cycle
- Second invocation passes through if (MessageFromMainThreadToMe_ExecutionRequested) and realizes execution contexts are done.
Because the second block is an else if, it isn't considered.
- Third invocation passes through if (MessageFromMainThreadToMe_PlanningRequested) and kicks off short-term planning
- Fourth invocation observes short-term planning is done and passes CurrentFrameNumber++ again.
- Overall, we're spending ~12ms in executionContext, ~5ms on the parallel short-term planning, and the remainder of the 50ms is spent waiting for DoActualSimStep to come in again.
It seems to me that at a UI set to 60fps, 0000027:0000025% is the best we can do in this situation:
If we need 3 UI frames per sim frame, that's 20fps sim frames max, whereas a 7x speed multiplier wants 70fps sim frames, assuming 1x is 10 simfps.
Re-testing at 120fps changes things a little, but not that much - my machine gets up to 0000034:0000040% there -
though I'm still not seeing a clear culprit - mostly, the spacing between DoActualSimStep calls becomes inconsistent,
usually being 8ms apart but sometimes as much as 96ms(?!?). I'm also seeing some delays in the main thread,
for example it sometimes spends 6ms or more between "else if ( !ArcenShortTermPlanningContext.MessageFromMainThreadToMe_PlanningRequested" and "ArcenShortTermPlanningContext.RunAllContexts();",
perhaps the batch message processing?
Ultimately, to break higher sim FPS at high speed multipliers,
we'd need to break the dependency between DoActualSimStep() and the different background threads -
perhaps have a dedicated background thread responsible for switching between them,
if necessary pausing between frames if the main thread needs to come do things in person.
Or maybe have the background threads kick each other directly.
But getting up to a 1:1 multiplier between UI frames and sim frames would stretch the cap a bit more -
though a 60fps lock still wouldn't get more than 6x sim speed.
But it does look like parallelism and thread scheduling isn't the bottleneck, certainly with the ManualResetEventSlim triggering threads.
I'm unsure if there's still issues preventing full utilization of multi-core machines -
Even on the galaxy screen and uncapped framerate, my system maxes out at 0000271:0000140 UI fps, so I get nowhere close to filling all cores.
Besides, it looks like only one of the 3 simstep routes (the short-term planning) is actually spreading across multiple threads,
both executionContext and longterm planning don't have that much going on.
And then some example logging output from BearPerson:
What logging looks like for what I assume is a single sim frame
(I did occasionally see examples where the executionContext thread or some of the ShortTermPlanning threads overran past the next DoActualSimStep call,
but these were rather rare)
[1 ] @ 00:00:00.4969899: DoActualSimStep() begun, frame is 5679
[1 ] @ 00:00:00.4970007: GetContextsDone?
[1 ] @ 00:00:00.4970043: GetContextsDone!
[1 ] @ 00:00:00.4970090: DoActualSimStep() decided to poke ArcenShortTermPlanningContext
[1 ] @ 00:00:00.4970115: GetContextsDone?
[1 ] @ 00:00:00.4970140: GetContextsDone!
[1 ] @ 00:00:00.4970218: Kicked next cycle for outerThreadFor_ArcenExecutionContext
[1 ] @ 00:00:00.4970243: DoActualSimStep() done, frame is 5680
[26 outerThreadFor_ArcenExecutionContext] @ 00:00:00.4970575: Began running.
[26 outerThreadFor_ArcenExecutionContext] @ 00:00:00.4970694: Kicked next cycle for executionContext
[26 outerThreadFor_ArcenExecutionContext] @ 00:00:00.4970725: Done.
[27 executionContext] @ 00:00:00.4970764: Began running.
[27 executionContext] @ 00:00:00.5093115: Done.
[1 ] @ 00:00:00.5140053: DoActualSimStep() begun, frame is 5680
[1 ] @ 00:00:00.5140125: DoActualSimStep() decided to poke ArcenExecutionContext
[1 ] @ 00:00:00.5143747: Kicked next cycle for outerThreadFor_ArcenLongTermPlanningContext
[1 ] @ 00:00:00.5143805: DoActualSimStep() done, frame is 5680
[28 outerThreadFor_ArcenLongTermPlanningContext] @ 00:00:00.5144126: Began running.
[28 outerThreadFor_ArcenLongTermPlanningContext] @ 00:00:00.5144290: Kicked next cycle for specialFactionContext_2_AI
[28 outerThreadFor_ArcenLongTermPlanningContext] @ 00:00:00.5144328: Done.
[29 specialFactionContext_2_AI] @ 00:00:00.5144625: Began running.
[29 specialFactionContext_2_AI] @ 00:00:00.5155713: Done.
[1 ] @ 00:00:00.5300320: DoActualSimStep() begun, frame is 5680
[1 ] @ 00:00:00.5300447: DoActualSimStep() decided to poke ArcenShortTermPlanningContext
[1 ] @ 00:00:00.5300475: GetContextsDone?
[1 ] @ 00:00:00.5300741: Kicked next cycle for outerThreadFor_ArcenShortTermPlanningContext
[1 ] @ 00:00:00.5300768: DoActualSimStep() done, frame is 5680
[15 outerThreadFor_ArcenShortTermPlanningContext] @ 00:00:00.5301115: Began running.
[15 outerThreadFor_ArcenShortTermPlanningContext] @ 00:00:00.5317575: Kicked next cycle for strengthCountingContext
[16 strengthCountingContext] @ 00:00:00.5317713: Began running.
[15 outerThreadFor_ArcenShortTermPlanningContext] @ 00:00:00.5317866: Kicked next cycle for movePlanningContext
[17 movePlanningContext] @ 00:00:00.5317935: Began running.
[15 outerThreadFor_ArcenShortTermPlanningContext] @ 00:00:00.5317962: Kicked next cycle for targetingContext
[15 outerThreadFor_ArcenShortTermPlanningContext] @ 00:00:00.5318071: Kicked next cycle for collisionContext
[15 outerThreadFor_ArcenShortTermPlanningContext] @ 00:00:00.5318154: Kicked next cycle for metalFlowContext
[15 outerThreadFor_ArcenShortTermPlanningContext] @ 00:00:00.5318242: Kicked next cycle for ProtectionContext
[19 collisionContext] @ 00:00:00.5318190: Began running.
[18 targetingContext] @ 00:00:00.5318442: Began running.
[15 outerThreadFor_ArcenShortTermPlanningContext] @ 00:00:00.5318522: Kicked next cycle for tachyonContext
[15 outerThreadFor_ArcenShortTermPlanningContext] @ 00:00:00.5318608: Kicked next cycle for tractorContext
[15 outerThreadFor_ArcenShortTermPlanningContext] @ 00:00:00.5318677: Kicked next cycle for gravityContext
[15 outerThreadFor_ArcenShortTermPlanningContext] @ 00:00:00.5318733: Kicked next cycle for aiwcGravityContext
[15 outerThreadFor_ArcenShortTermPlanningContext] @ 00:00:00.5318766: Done.
[20 metalFlowContext] @ 00:00:00.5318918: Began running.
[21 ProtectionContext] @ 00:00:00.5319436: Began running.
[24 gravityContext] @ 00:00:00.5320090: Began running.
[24 gravityContext] @ 00:00:00.5320165: Done.
[22 tachyonContext] @ 00:00:00.5320284: Began running.
[25 aiwcGravityContext] @ 00:00:00.5320813: Began running.
[22 tachyonContext] @ 00:00:00.5321952: Done.
[25 aiwcGravityContext] @ 00:00:00.5322104: Done.
[23 tractorContext] @ 00:00:00.5323362: Began running.
[23 tractorContext] @ 00:00:00.5329696: Done.
[21 ProtectionContext] @ 00:00:00.5329909: Done.
[20 metalFlowContext] @ 00:00:00.5332419: Done.
[19 collisionContext] @ 00:00:00.5335085: Done.
[17 movePlanningContext] @ 00:00:00.5342066: Done.
[18 targetingContext] @ 00:00:00.5345075: Done.
[16 strengthCountingContext] @ 00:00:00.5345347: Done.
[1 ] @ 00:00:00.5472518: DoActualSimStep() begun, frame is 5680
[1 ] @ 00:00:00.5472624: GetContextsDone?
[1 ] @ 00:00:00.5472668: GetContextsDone!
[1 ] @ 00:00:00.5472687: DoActualSimStep() decided to poke ArcenShortTermPlanningContext
[1 ] @ 00:00:00.5472710: GetContextsDone?
[1 ] @ 00:00:00.5472735: GetContextsDone!
[1 ] @ 00:00:00.5472807: Kicked next cycle for outerThreadFor_ArcenExecutionContext
[1 ] @ 00:00:00.5472834: DoActualSimStep() done, frame is 5681
At this point, I can’t figure out what is exactly causing the DoActualSimStep to trigger with such regularity. There is a complex system of messaging that takes into account multiplayer and so forth and that goes out of the open-source areas of the code. I think Keith will need to look at it.
What is interesting is that when the sim frame frequency multiplier (GetCurrentFrameFrequencyMulitplier) is reset to its default values, the sim runs at full speed in the savegame that I was showing as the test case above I’m not sure if that’s a problem with the frame frequency multiplier, or simply an indication that we’re somehow maxing out some element of the simulation and it just won’t go faster than that. Either seems reasonable, but the debugging above from BearPerson was at +7 frame frequency, which I believe is at a level that indicates the background threads are not what is holding things up, at least.
It may be that simply going through the loop fast enough to process the sync stuff needed to get to the next DoActualSimStep is too slow to be accelerated more than a certain amount. If that was true, then fast forwarding would have a definite cap to it, but the actual realtime simulation would not (because the actual thread-syncing is now fast, thanks to BearPerson, and because those threads seem to be not the limiting factor as he observed).
In the short term, waiting for a regular-speed savegame example that runs below 100% sim speed I think is the thing to do. On the galaxy map, that is -- there are various savegames, including the one above, that run below 100% sim speed when you also factor in the load from the vis layer while on the planet view, and that’s something I’m already in the process of addressing by lowering the load of the vis layer.
On the FLIP side of that, though -- if all my CPU cores aren’t really that busy (they aren’t) and the main thread is still running at 20-30fps because of the vis layer slowdown, then that would again argue that something in the calls to DoActualSimStep is too darn bloody slow on the main thread, or too timing-dependent, either way.
My assertion, on reflection: if the vis layer is bogged down enough that I’m running at 20-30fps, but my background cores are still only half used, then the sim speed should still be 100% or very close. Right now I’m seeing 60-70% instead in the savegame above. Which is a big step up from the 40-50% I was seeing before my various changes and BearPerson’s big changes.
||Keith, at this point I think I am needing to kick this one to you, for when you get a chance to look into it.|
Notes for Keith:
This is something that affects pretty much all large battles, so it's going to frustrate pretty much everyone by making the game feel sluggish. There's a lot in there, so let me summarize what we know right now:
1. There were a messload of suggestions on how to handle our multithreading optimally.
2. I wound up moving that code to the open-source section of our codebase.
3. BearPerson came in and made short work of the problem, getting our ArcenThread class updated to be faster and no longer using Thread.Sleep(). So all that bit is done, and it transitions between threads quite fast now.
4. He then added some profiling that is more efficient than what we had and that gives output showing that the background threads are really not the limiting factor, but rather DoActualSimStep is. The frequency of calls to that seems to be the limiting factor, aka that it's not calling it frequently enough.
5. I can't for the life of me figure out how to speed that up without adjusting the literal sim performance profile class, which seems like it would affect game timing in a negative way in general. But perhaps what we're simply seeing is that the game timing for clock time, and the trip-the-next-batch timing, are simply too close together and need to be decoupled in some manner.
Put another way, there's a 50ms or so span where the DoActualSimStep needs to happen, and it can happen anywhere in there. I'm concerned that perhaps it's winding up starting 25ms in or something, and thus running over by 10ms in high load cases, and thus running slow even though it was idle that first 25ms. Something like that.
And I think that perhaps this compounds if you're really trying to run "4 sim frames in 200ms," and it doesn't matter as much exactly that each sim step is within its appropriate 50ms slot so long as it hits 4 within that 200, and they are roughly evenly spaced as best we can. Understanding that some frames are simply going to be very long (30ms+) because of normal vis layer load.
In a lot of respects this would argue for having DoActualSimStep on its own background thread that I guess runs continuously (Thread.Sleep() is thoroughly out, because it isn't reliable in how long it sleeps for). But since the main thread is so unreliable in how long update frames are, it may be untenable for our purposes of trying to have a smooth sim and a non-smooth vis layer.
||This particular set of savegames is out of date, however I think that the notes in here are worth keeping alive in mantis for now, until Keith sees fit to close them.|
|Jul 11, 2018 1:29 pm||x4000Bughunter||New Issue|
|Jul 11, 2018 1:29 pm||x4000Bughunter||Status||new => assigned|
|Jul 11, 2018 1:29 pm||x4000Bughunter||Assigned To||=> x4000Bughunter|
|Jul 11, 2018 1:29 pm||x4000Bughunter||File Added: PerformanceTest2.savemet|
|Jul 11, 2018 1:30 pm||x4000Bughunter||File Deleted: PerformanceTest2.savemet|
|Jul 11, 2018 1:32 pm||x4000Bughunter||File Added: Performance.zip|
|Jul 11, 2018 1:32 pm||x4000Bughunter||Note Added: 0047814|
|Jul 13, 2018 11:39 am||x4000Bughunter||Note Added: 0047844|
|Jul 13, 2018 11:39 am||x4000Bughunter||Note Added: 0047845|
|Jul 13, 2018 11:39 am||x4000Bughunter||Note Added: 0047846|
|Jul 13, 2018 11:51 am||x4000Bughunter||Note Added: 0047847|
|Jul 13, 2018 11:51 am||x4000Bughunter||Assigned To||x4000Bughunter => keith.lamothe|
|Jul 13, 2018 11:52 am||x4000Bughunter||Note Added: 0047848|
|Jul 13, 2018 12:05 pm||x4000Bughunter||Note Added: 0047849|
|Aug 29, 2018 7:55 pm||x4000Bughunter||Status||assigned => acknowledged|
|Aug 29, 2018 7:55 pm||x4000Bughunter||Note Added: 0048552|