View Issue Details
|ID||Project||Category||Date Submitted||Last Update|
|0023176||AI War 2||[All Projects] Crash/Exception||May 6, 2020 1:19 pm||May 13, 2020 11:51 am|
|Product Version||2.032 Savegame Hotfix|
|Fixed in Version|
|Summary||0023176: Game intermittently crashes entire WindowServer|
|Description||OS: macOS Catalina 10.15.4|
Hardware: iMac Retina 5K, 27" Late 2015
Game: latest version from steam (Content BuildID 4955232)
After playing for a while, a simple click crashes the entire WindowServer, basically forcing a log-out. Happened three times so far, each time (seemingly) in response to a Ctrl-click.
Note that I'm using the Caps-Lock key as Ctrl (on the OS level).
Attached the latest macOS crash log and AI War2 log.
NOTE: This was originally reported here: https://dev.arcengames.com/view.php?id=20798
That is a non-production copy of our bugtracker that publysher accidentally found his way into.
|Tags||No tags attached.|
No problem on the duplicates. I definitely would be interested in the log files. If you want to email them to chrispark7 at gmail, that would be fine.
Also, if you have a log from here directly after running the game and it crashing, that would be extra ideal: ~/Library/Logs/Unity/Player.log
If you have run ANY unity game since having the crash, then that particular log would have been completely overwritten. A copy of it from right after you have such a crash would be really welcome and probably have the most detailed info.
It's possible that the ArcenDebugLog.txt file will have enough info to go on, or it's possible it might be missing the last entries because that part of the logging was already shut down.
On the macOS crash log itself, I don't mind taking it, but usually I have no way to interpret those. Unity as a game engine runs in a C++ environment inside the OS, and then runs a mono environment inside the C++ environment. We live in the mono environment and have the ability to interpret what those crash data mean. The offsets and other data that are in an OS crash log usually only reference pieces that are in the C++ part of the unity platform, which is code outside of our control and that we don't have the source code or symbols for. Sometimes it has a nugget in there that is useful, but not usually.
With your hardware being an "iMac Retina 5K, 27" Late 2015," from a quick google it looks like the absolute minimum RAM you might have is 8GB, so I'm not sure how you could possibly be running out of RAM with the way the game runs. Are you playing with the expansion installed, or just the base game? Either way, you should be topping out around 5GB of usage unless you've got a runaway loop somewhere.
Catalina is not an OS I have much experience with, as they keep changing things and locking them down in a lot of consumer-and-developer unfriendly ways. AI War 2 should work on Catalina, for now, but there's a switch they can flip at any point that would make it stop working in future OS updates. We list our official support on OSX as only being through 10.14 Mojave since things are so uncertain with Catalina. So far things seem okay, but Apple has been making some odd changes.
Hopefully none of that is what is going on here, though.
Thanks for the email! The osx crash log was something that was explicit to WindowServer, and actually didn't mention AI War 2 or unity at all, which is very very strange. The log in ArcenDebugLog.txt also was not showing any errors whatsoever.
My overall conclusion is that the WindowServer itself is crashing, and the game is probably not. Please don't get me wrong -- I'm sure that this is being caused by the game itself (most likely the engine), but that does mean that it's going to be extra hard to figure out what's going on.
From the crash log, you have plenty of RAM free, so it's none of the likely culprits.
The first thing I would try, which probably will not help, would be to verify your local files for the game via steam and make sure nothing is corrupted. Sometimes funky things happen if you have some corrupted files, but I would expect the game itself to be erroring if that was the case.
From googling some crashes of WindowServer with other unity projects that people have been working on, I come up with results like these:
The above would indicate that running in Metal is likely the problem... except I can see from your logs you are running OpenGL 4.1, which is what I would have suggested.
Other more recent users are saying that NOT running Metal is the problem, so switching to that instead of OpenGL 4.1 might be the solution for you.
The above would indicate that perhaps the Steam Cloud Sync is what is causing the problem, and there are some methods in there to make it not do that. Running the game while steam is offline or off would also bypass that as a test.
The above suggests that some people were in general having the problem with their OS, and not related to any games, and that apple was working on a fix about half a year ago. Making sure you have all the latest OS and driver updates from Apple would fix this if that's what it is. But I'm doubtful that this exact thing is your issue, or else you'd be having random crashes outside of AI War 2 or any other game, which I'm taking is not the case.
Right now my money would be on trying to run this via Metal, but updating OSX and your drivers would also be a good idea in case it is a problem there, verifying the AI War 2 files via Steam probably won't help but won't hurt, and potentially disabling cloud saves might also help if it's that issue.
I'll have a followup post with some technical thoughts on what might be happening.
So, I've already put in the main things that seem to be problematic, but I wanted to address some of what WindowServer is in general.
This writeup is really fantastic: https://www.quora.com/OS-X-What-is-the-WindowServer-process-and-what-does-it-mean-when-it-takes-up-high-CPU
I'll quote a few bits here that are relevant:
"WindowServer is primarily a compositing engine, and also manages canvasses.
A canvas is essentially a rectangular region which acts as a liner frame buffer and provides your ability to see an application window.
In addition to doing these jobs, it also manages the OpenGL pipelines to the video hardware, which is then used to render things like game windows, video, advertisements in browser windows, and so on.
It’s possible for it to take a lot of CPU in this role because, unlike DirectX in Windows, which emulated the GL after engaging in loop unrolling to ensure that the shaders won’t run forever taking up CPU time, macOS just stuffs the GL data down one of the pipeline channels, and has the hardware render it.
Because of this, badly written shaders can end up taking a lot of CPU, when on Windows DirectX, the loop unroll would simply have “failed” and been dropped on the floor without being rendered.
Some of the original “Unreal” engine shaders for fog effects had this problem, and some of the World Of Warcraft from Blizzard likewise had some problems.
WOW, in particular, was a problem — I was there while Mike Smith, one of the Apple engineers tasked with making the game not hang after a while — discovered that the problem was that some shaders would just run forever, and when you filled up all the GL channels to the nVidia card, then that was it — the desktop locked up.
This is less of a problem these days."
Saved me writing up some of those things.
This is an older article from 2018 which talks about why Metal was adopted as a requirement in Mojave on a hardware level, and why they deprecated OpenGL: https://appleinsider.com/articles/18/06/28/why-macos-mojave-requires-metal----and-deprecates-opengl
Essentially, with my own thoughts:
1. OSX has always had a hard time running OpenGL applications, because it has to essentially emulate the environment, and render it to a "canvas" that it then composites in to its main hardware loop.
2. OpenGL has the advantage of being rock-solid and has been around for a long time, but on OSX they froze everything at OpenGL 4.1. This is fine with me, personally, for AI War 2 at least, because that's a very feature-rich (but minus the newer compute shaders and tessellation), which we don't use in AI War 2 partly because those sorts of features are NOT rock solid and tend to have bugs that are very hardware-and-OS-and-driver dependent.
3. It's also worth noting that the deprecation of OpenGL happened after AI War 2 was already two years into development. They still support it, and we've not had any problems with OpenGL prior to your issue here (that anyone has brought to my attention). But we also added Metal support for future-proofing, and potential performance gains. But we use the equivalent feature set of the same shader model that OpenGL 4.1 supports, so in theory they both should perform about equally.
4. On all platforms, Vulkan is taking over where OpenGL left off in general, but it's under heavy development and has frequent issues. It's not really the sort of place I want to live right now, but in half a decade or a decade it will probably be another rock-solid standard.
On Windows, there is DirectX11 which has a ton of extra support for the funky hardware stuff that used to be problematic, but at this point it has been around so long that I'd call it rock solid. On Windows, there is also now DirectX12, which has a ton of new stuff and is definitely more performant, but it's bleeding edge new and has tons of problems frequently. This is where you get into the realtime ray tracing and so on. DX12 in general is more stable than Vulkan from what little I've seen of it, but it's still a brand-new pipeline and nobody really knows what to expect of it in the short term.
Metal is kind of a cross between a lot of these things. On the one hand, it has been around for a very long time, and parts of it are absolutely rock solid. On the other hand, sometimes there are bits where it just behaves... strangely. I would compare this to DirectX11 and how I felt about that maybe half a decade ago. And then to go along with THAT, they are also building in bleeding-edge features to Metal, ala DX12, but without making it a whole new platform or mode to run in. That has some pros for compatibility but also seems to have cons for stability. Sometimes upgrading to the latest version of an OSX OS will cause things to change subtly, and you're then at the mercy of when they fix it, or the engine developer (unity in this case) has to provide some kind of workaround.
5. Overall I've wanted to keep compatibility with Macs from 2012 and onward, and not all of those can support Metal. The newer ones that can run Mojave and onward still support OpenGL, even though it is deprecated, but I'm cool with that because I don't need the bleeding edge features that are most likely to fail anyhow. Having a frozen codebase is actually a big win in terms of stability, in my opinion. The catch is that WindowServer needs to interface between OpenGL and Metal, and if Apple messes something up in there, then it's going to be a bad time (as you've experienced). They aren't developing OpenGL any more, but they are developing Metal, and as long as they keep WindowServer up to date as an appropriate translation layer, then all is well. If they drop the ball at any point, like happened in late 2019 for some random Apple users, then they have to wait until there's a fix at the OS level.
6. Using Metal is an option in AI War 2, and it's overall a good one, but it makes me mildly uncomfortable. Because Metal is basically the equivalent of DirectX 9-12 all rolled into one package, if they make some fundamental changes to the bleeding edge version of what would be DX12, then it would potentially negatively impact what is happening for us at the equivalent of DX10.1. So my experience thus far with Metal has been along the lines of what used to happen with DX11 half a decade ago, where things that were working would suddenly break for no real reason, and there was no fix we could make because it was at the OS, driver, or possibly engine level. Being at the mercy of several other companies to fix their stuff while people are mad at me doesn't feel great, and I don't have the clout to go talk to nVidia or Apple or Unity and ask for something directly like a AAA developer does. So I've overall avoided Metal, while providing support for it. In the case of the current situation, it looks like that is biting you a bit.
There are newer versions of Unity than what we are using, but there have been continual ongoing problems with them fixing and breaking things, and there are no real features that are new that would actually benefit the game itself. Many of their changes do help with compatibility with newer versions of OSX and Windows to some extent, particularly with Meetal, but they also come with new bugs in other areas, some of which work just great right now. In some cases they make API changes that will require us to majorly rewrite some things that are perfectly functional at the moment, and which may take a few days to get working again on the visual front.
Up until now, OpenGL has been rock solid enough in terms of compatibility, and it hasn't been at the mercy of these other companies and their constant updates and changes, so that's been basically where I've left my focus.
By the way, here was the relevant part of the crash log for the WindowServer:
Process: WindowServer 
Version: 600.00 (450.9)
Code Type: X86-64 (Native)
Parent Process: launchd 
Responsible: WindowServer 
User ID: 88
Date/Time: 2020-05-03 08:26:14.097 +0200
OS Version: Mac OS X 10.15.4 (19E287)
Report Version: 12
Time Awake Since Boot: 1400000 seconds
Time Since Wake: 43000 seconds
System Integrity Protection: enabled
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY
Application Specific Information:
MetalDevice for accelerator(0x3343): 0x7fd13a60ed98 (MTLDevice: 0x107d95000)
IOService:/AppleACPIPlatformExpert/[email protected]/AppleACPIPCI/[email protected]/IOPP/[email protected]/ATY,[email protected]/AMDFramebufferVI
Assertion failed: (isReady - CoreDisplay: CD surface count issue (CoreDisplay)
Surface Use Counts: 23(0) 3(2)
FB RegID: 4294968546, On Glass SurfaceIDs: 2 24 4, Transactions: [ Complete: SurfaceID: 24 ] [ Active: SurfaceID: 4 ], Surface Use Counts: 24(1) 2(1) 4(2) , Notified IsActive SurfaceIDs: 4, 2, 24
), function CoreDisplay_NotReady, file /AppleInternal/BuildRoot/Library/Caches/com.apple.xbs/Sources/CoreDisplay/CoreDisplay-220.127.116.11/CoreDisplay/Display/Display.cpp, line 2754.
Thanks for the extensive write-up. I've mailed the Unity log files.
* WindowServer crashing is specific to running AI War 2; other programs and games do not have this problem
* Running Metal: will try that out and let you know
* Running in Steam Offline mode: will try that out and let you know
As an aside: I really appreciate you taking the effort to ensure your games run on OSX as well. Love the game so far.
(At this point I realized we were on the wrong bugtracker, and deleted the other bugtracker as well as moved things over to here.)
He also sent over the latest log files, noting "After the WindowServer crash, logging in automatically restarts AI War 2, so it’s probably the `Player-prev.log`"
My response to that bit is that, as he suspected, the player-prev.log is what had the info in it. But in there it showed no shutdown or crash, so I'm not sure what exactly is happening. The issue is definitely so severe that even unity's logging is completely halted, which is probably a symptom of the WindowServer dying so thoroughly.
1. OpenGL + Steam Offline mode --> still crashes
2. OpenGL + Disable Steam Overlay --> still crashes
Now it's hard to prove the absence of a crash, but it _seems_ that running Metal solves the problem. I've managed to finish my first campaign (yay) without any crash.
Excellent! Thanks for the confirmation on your findings, I hope they will be useful to someone else.
Have there been any visual artifacts or oddities with running in Metal for you?
|May 6, 2020 1:19 pm||x4000Bughunter||New Issue|
|May 6, 2020 1:19 pm||x4000Bughunter||Status||new => assigned|
|May 6, 2020 1:19 pm||x4000Bughunter||Assigned To||=> x4000Bughunter|
|May 6, 2020 1:22 pm||x4000Bughunter||Reporter||x4000Bughunter => publysher|
|May 6, 2020 1:22 pm||x4000Bughunter||Note Added: 0056963|
|May 6, 2020 1:23 pm||x4000Bughunter||Note Added: 0056964|
|May 6, 2020 1:23 pm||x4000Bughunter||Note Added: 0056965|
|May 6, 2020 1:23 pm||x4000Bughunter||Note Added: 0056966|
|May 6, 2020 1:23 pm||x4000Bughunter||Note Added: 0056967|
|May 6, 2020 1:28 pm||x4000Bughunter||Note Added: 0056968|
|May 13, 2020 3:14 am||publysher||Note Added: 0056981|
|May 13, 2020 11:51 am||x4000Bughunter||Note Added: 0056982|
|May 13, 2020 11:51 am||x4000Bughunter||Status||assigned => resolved|
|May 13, 2020 11:51 am||x4000Bughunter||Resolution||open => fixed|