View Issue Details
ID | Project | Category | Date Submitted | Last Update | |
---|---|---|---|---|---|
0003077 | AI War 1 / Classic | Crash/Exception | Mar 21, 2011 4:12 pm | Mar 23, 2011 7:58 pm | |
Reporter | Fleet | Assigned To | Chris_McElligottPark | ||
Status | resolved | Resolution | fixed | ||
Product Version | 5.005 | ||||
Fixed in Version | 5.006 | ||||
Summary | 0003077: Desync | ||||
Description | Here is a save...I think the AI likes to do this when we are making progress in a game. Desync occurs on save loading, right after ally connects. | ||||
Tags | No tags attached. | ||||
Internal Weight | |||||
related to | 0002480 | closed | keith.lamothe | Multiplayer Connection Issues |
related to | 0002460 | resolved | Chris_McElligottPark | Unable to play multiplayer game. |
related to | 0002546 | resolved | Chris_McElligottPark | Scout Exporing -> Desynch |
related to | 0002685 | resolved | Chris_McElligottPark | Desync |
|
|
|
Hmm, once again I cannot duplicate... very odd. Can you each post your desync logs, and a savegame from each computer right after the desync occurs? That would allow us to see what is different, at least, which ought to be super handy since this one desyncs so instantaneously for you. You both had started AI War fresh before getting into this game, yes? |
|
|
|
DesyncErrors.txt (514 bytes)
3/21/2011 1:10:06 PM (5.005) ----------------------------------- Tss Desync Error----------------------------------- Tss Sent: P0 Sc431363 Sh1235 En21800 Cr4095995904 Ma2882528368 Kn9992 : P1 Sc384588 Sh2526 En-50334 Cr3968410246 Ma2110946425 Kn6000 : P8 Sc677203 Sh3487 : P9 Sc699577 Sh3628 : Rd837150552 Tss Should Have Been: P0 Sc431363 Sh1235 En21800 Cr4095995904 Ma2882528368 Kn9992 : P1 Sc384588 Sh2527 En-50334 Cr3968410246 Ma2110946425 Kn6000 : P8 Sc677203 Sh3487 : P9 Sc699577 Sh3627 : Rd837150552 |
|
My ally reports no desync file in his folder. I am pretty sure we both started the program fresh. |
|
Okay, actually, I forgot the desync log is on the server over -- but that was useful, thanks. The difference is that the second AI player had one fewer ships on the Tss machine compared to yours, so something either wasn't created or was destroyed. Is the original Desync.sav file from right after the desync, or from before it? Do you still desync when you load that one? If not, do you have the savegame that you are instantly desyncing with? There should be two saves of relevance from the host, the before and after save, and just an after from the client. |
|
My ally's internet is blacking out...we will get extra saves uploaded when it returns, and test the current save again. My save is from after the desync...I'll check with my ally and get it sorted out, and get some more descriptive file names to clear things up. |
|
Okay -- thanks. So that might be why I can't duplicate it, then. If you also have the save from right before the save, that would be ideal. Having a save that would reliably desync on load is like a dream come true for me -- the next best thing to having no desync at all. ;) |
|
|
|
I've uploaded my save file they i loaded, he connected to me, then the desync. Right now he actually cannot connect at all, to my loaded save game, or when i'm hosting lobby. Stops at 0%, think his internet is still down. |
|
VICTOLY! I got the desync, just after loading up that first save. Thanks very much, that's all we need to finally hunt this one down. And it proves that at least this one isn't related to your specific machines, or any sort of setting files, or anything like that. Very, very, excellent. Will post back when I have updated news on this, it might be late tonight or else tomorrow. I have what I need now, but it's still really tricky to hunt down. Thanks for your ongoing patience and help with this -- I'm super excited to finally have a break in this case, so that hopefully this can be the end of the desyncs for you guys! |
|
Thanks so so much. No rush on the fix, I know your super busy with your shiny new project, in addition to maintaining AI and Tidalis. VICTOLY AT LAST! |
|
Indeed! And my pleasure. |
|
Wow, this is a crazy desync. I haven't tracked it all the way down, but what I am finding is that it doesn't look single-point. There are no less than six planets with significant desyncs on them, and in the case of the client it winds up having this one engineer get shot that doesn't get shot on the server, as the AI is pulling out. It's really quite odd. No idea what it is yet, but I am eliminating what it's _not_, which is something. I've got a nice thread to tug on here, though, so I'm just yanking away and seeing what unravels out of this evil sweater. ;) |
|
Jeez, I hope it isn't anything as severe as Unity or Mono not obeying their system independent guarantees in their APIs. Of course that is near worst case scenario, and I highly doubt it. But sometimes bugs are so obscure and hard to track down that you begin to wonder... |
|
There is a 0% chance of that. I'm desyncing with two copies running on my one workstation. |
|
But, that is something I had been worried about in the past with how nebulous this was with Fleet and Tss, though. It's only until I was able to reproduce this yesterday that I became positive that wasn't the issue. |
|
Just throwing this in here, I saw some odd behavior with a data center, I think from this game. I used a penetrator to deep strike it, and shift-clicked back to my world. I then looked at other battle fronts. After a while, I tabbed to galaxy view, to look at where my penetrator was. Seeing several blinking planets from the data center planet, I knew it was making it back OK, and had hit the target some time ago, since it had time to retreat through several planets. I then clicked on the data-center-(less) planet, and for a moment the icon was still there. only now did I get a message that the AIP had dropped, and the icon disappeared. Odd...maybe it was put-off because the planet was in cold storage, or some-such, and was not processed until I brought it back into focus? |
|
Hummm... that is really odd, yeah. I'll check into that, but I'm not sure what would cause that, as anything of the sort you describe is simulation-based, not visual-based. |
|
Doesn't cold storage stop once the player switches to that planet? If this is true, and there is indeed a bug with cold storage and death processing or something (a BIG if), could that explain how his actions (merely looking at a planet) that shouldn't change the simulation did, and maybe even certain types of desyncs? |
|
That could explain that, actually, but it wouldn't cause a desync. The view stats it is checking are synchronous. |
|
Well, if you later do check into cold storage behavior, and find an issue, it could explain my long standing bug that I was ranting about a bunch a month or so back, 0002608. |
|
That could all be one thing, yeah. |
|
Well, I think I'm getting closer, anyway. It's either in the targeting or the positioning -- I suspect in the positioning. I've ruled out that it has nothing to do with: AI eyes, the AI thread, or minor factions. I've also traced the problem back to some targeting choices that the simulation is handling differently one one machine versus the other. Everything happens validly (but differently) after those initial different choices. What causes _those_ choices, in turn, I've yet to fully trace down to. But I'm getting closer! |
|
"Everything happens validly (but differently) after those initial different choices" Don't you hate it when there is more than one valid answer for a problem, but you need consistency as well as correctness? A very interesting issue you got there, if for no other reason than from the formal computational problem solving angle. |
|
x4000, have you considered writing a blog that details your endeavors in tracking down these bugs? I think there is some untapped ad revenue here. |
|
He does have a blog, but I'm not really sure how often he updates it with his "adventures" dealing with these tricky academic and practical computer science problems he runs across. |
|
I do talk about them sometimes, but in terms of desyncs, I haven't written on that very extensively. It's a much more niche topic than what I usually do. I also intentionally don't keep ads on my blog, I just use it to build potential interest in Arcen and what we're doing (which has worked really well, actually, historically). |
|
Well, a couple of collisions are happening out of order, and then things cascade into wrongness from there. Not sure yet why this is happening, but getting closer. |
|
Now we're really getting somewhere: the performance throttling for collisions is having a different weighting between the two games for a viral shredder. Then the cascade that causes the different ordering is a result of this. Now to find out why the performance weighting is different... |
|
Well, after extensive logging, the thread continues. The collision stuff is off because, initially, a single ship's position was off. The ship in question varies each time the game starts, I think. And it may be more than one ship, I can't tell cause from cause with the logging I've done so far. On the downside, this means that there is something, somewhere else, that's causing this, and I have yet more hunting to do. On the bright side, it does mean that the collision checking logic is just fine, which is nice for Keith and I (being that we've examined that repeatedly and found nothing wrong with it). |
|
Ouch, it sounds like the problem is pretty deeply rooted. I mean, position of objects is kind of fundamental in, well, any game or simulation. |
|
That doesn't have any bearing on how deeply rooted it is. Clearly this is a very obscure, NOT deeply rooted issue, given how rare it has been for most folks. There's some obscure code branch that is misbehaving and causing the location to shift, is all I mean. It's not like the location-setting code itself was broken, or everyone would be having problems. I'd thought there was something in the collision checking that was causing the location offsetting, but I now find that it's somewhere (unknown as yet) else. |
|
So that mean another fun round of sweeping the code base. yay :( |
|
No, not at all. With a save in hand, there is no sweeping of any sort. It means logging, and then analyzing the logs, and then moving along the thread a bit further, and then repeating, until I find the root cause. It's not hard if you understand the specific codebase you're doing it for (and luckily, I do), it's just tedious and extraordinarily time consuming if there's a lot of links. |
|
|
|
If you're curious, I uploaded the various logs so far -- the diffs between them are the relevant info. The 0 series is from instance 1, the 1 series from instance 2, all on the same PC, leading from save load up through the desync, but a separate desync per log pairing, so they don't line up in terms of ship indexes, etc. The desync is slightly different every time, because of the natural different random seeds every time you load a save. |
|
Well, I bet you're glad about that. ;) Sweeping the code base called in a specific instance and reviewing the data the code was using via the logs seems much less work than an entire code sweep. Still not a small task, but certainly much smaller. EDIT: I'll shut up for now so you can actually get to doing it, instead of talking about doing it. :D |
|
Yes, it's an impossible task without a save to reproduce, which is why this has hung around for months for Fleet and Tss. With a save in hand I've still been at this for something like six hours today, and haven't finished yet. This is shaping up to be the most tricky desync I ever faced. I didn't think anything would ever top the one that was based on which player was LOOKING at a force field when an enemy was spawned under it, but that one was found more quickly. A successful leap of intuition shortcutted that one, to be fair, though. So far I've had MANY leaps of intuition on this one, as has Keith, over the last months as well as just today, but none of them ultimately led anywhere. |
|
Good night!! How much slower does the game run when it has to log pretty much everything it does, including stuff in the inner loops? |
|
I only talk while I'm waiting for the logs; the logging is so slow, sometimes, that it takes a minute or two to play two seconds of gametime, which is a lot of sitting there. |
|
Incidentally, that's not everything it does -- it's just barely a fraction of it, just where I'm focusing my attention at the moment. |
|
2 minutes for 2 seconds of game time, 1 min per 1 second, 60 sec : 1 sec. 6000% time scaling (with 100% being the target). X_X |
|
Disk access is slow. |
|
Another break in the case! After doing the latest round of logs (each file was 32MB there, but incidentally it ran faster because it was writing much more per write call but having fewer write calls -- very interesting tidbit), I found this to be the culprit: A:21608,ZenithViralShredderIV,-15337,1642 at System.Environment.get_StackTrace() at ForegroundObject.SetPosition(Int32 X, Int32 Y) at ForegroundObject.MovePlayerUnit(Boolean RecalculateFull, Boolean RecalcualteMiddling) at Game.RunOneCycleOfSimulation() at Game.RunNextCycle(Boolean DoRendering) at Game.RunFrame() at MainCameraLogic.FixedUpdate() So, something about the movement logic about the viral shredders (or ships in general, possibly) is off. It's not collisions or targeting or anything like that. It's definitely originating in MovePlayerUnit, which is a _significant_ reduction in the search area. Also not somewhere I would have guessed to look, that stuff has been pretty bulletproof, traditionally. *rolls up sleeves* |
|
Aha, clearly the culprit is System.Environment.get_StackTrace()! Clearly, to ensure consistency, the call stack must never be accessed. ;D |
|
That's part of the logging I'm doing, and not normally there, but I recognize the humor. ;) |
|
Yea, that is one downside of text based communication. It's hard to tell if you are making a stupid suggestion that leads to ludicrous results to be silly, or making a stupid suggestion that leads to ludicrous results because you are ignorant. |
|
I could tell you were joking, but for anyone else following along I wanted to be clear. |
|
|
|
Nailed. It. To. The. Wall. After only... what has it been? 7 hours? 8? I lost track. I uploaded 0Recalcs.zip for the curious, which includes the file logs that finally helped me figure out what the problem was for certain. My next step after the location-setting stuff was into the movement branch, which I said had seemed bulletproof in the past, but which was definitely originating the wrong-location at this time. Well, as it turns out, the movement branch's honor is still clean -- it was the direct cause of the wrong-location, but that's because IT was getting a different variable set incorrectly. That variable was the number of "times the full movement recalculation has been skipped," which gets set to a really high number whenever a ship gets a new destination set. The problem is, when ships were being deserialized on the server from the loaded savegame, that would happen one time; and then when that was reserialized and sent to the client and deserialized again, it was happening a second time. There was an attempt in there at preventing that, but it was overlooking this particular variable setting. And actually, even the attempt was overlooking some branches like setting the object-to-kill on a ship, which then sets the destination -- and which happens during deserialization. That's one of those things that has been in place since at _least_ version 3.0 of the game, if not before. Although the full-recalculation-skip thing is newer, maybe since 4.0, I'm not sure exactly without going through the SVN history, which I'm not inclined to right at the moment after all that. So this was actually a two-headed desync, one that has been around since 3.0 or before and which might not actually have ever caused a desync (it's hard to be sure), and one which is newer since probably around 4.0, and which also for most people never caused a desync. This particular save, for whatever reason, was able to trigger it with about six ships that were in battle -- I think the melee ships are more prone to desyncing with this if they are in the middle of battle when the save happens, which is probably why most people don't see it. Those six or so ships then caused a state-cascade which caused hundreds of ships on six or more planets to have a desynced state within the first cycle of the game running, which is one of several things that made this so bloody hard for me to find. All in all, this is an entirely new class of desync for our game. Normally I group them into the following sets: Improper Use Of Local State, Improper Use Of IsHost, Improper Use Of LocalPlayerNumber, Improper Use Of Random, and Improper Cleansing Of Reusable Objects Between Game Loads. The one really rare one I'd had before with the force field displays was actually unique as well, and what I'd call Improper Use Of Reset Counters. This one is similar, but different: Improper Triggering Of Counters During Savestate Load. Blah. After all that time and frustration, I figured that Fleet and Tss deserved an explanation, at least! Not only did they discover a whole new class of desync, but this one was "cleverly" obfuscated so well that it was almost virus-like in how well it was concealed. I did it to myself, of course, but a rogue hacker couldn't have hidden it any better, is all I mean. All in all: man am I glad to have this one done! I'll do a release in a bit. |
|
Good one tracking that down. :) |
|
Thanks! :) |
|
Fleet, related to your comment about the data centers: * Previously, if there were fewer than 30 enemy-to-the-AI ships and none of them were both uncloaked and had attack strength of 0, then the planet was incorrectly being put into cold storage. Now it's only in cold storage if there are fewer than 30 enemy-to-the-AI ships and all of them have full cloaking, which was the intended logic from the start. ** This should fix some issues with the devourer golem not really doing its thing, and in general some minor faction issues in general. |
Date Modified | Username | Field | Change |
---|---|---|---|
Mar 21, 2011 4:12 pm | Fleet | New Issue | |
Mar 21, 2011 4:12 pm | Fleet | File Added: Desync .sav | |
Mar 21, 2011 4:20 pm | Chris_McElligottPark | Note Added: 0011188 | |
Mar 21, 2011 4:20 pm | Chris_McElligottPark | Assigned To | => Chris_McElligottPark |
Mar 21, 2011 4:20 pm | Chris_McElligottPark | Status | new => feedback |
Mar 21, 2011 4:20 pm | Tssbackus | File Added: Tom Ed coop.sav | |
Mar 21, 2011 4:21 pm | Fleet | File Added: DesyncErrors.txt | |
Mar 21, 2011 4:26 pm | Fleet | Note Added: 0011189 | |
Mar 21, 2011 4:26 pm | Fleet | Status | feedback => assigned |
Mar 21, 2011 4:30 pm | Chris_McElligottPark | Note Added: 0011190 | |
Mar 21, 2011 4:34 pm | Fleet | Note Added: 0011191 | |
Mar 21, 2011 4:36 pm | Chris_McElligottPark | Note Added: 0011192 | |
Mar 21, 2011 4:38 pm | Fleet | File Added: Fleet before desync.sav | |
Mar 21, 2011 4:40 pm | Fleet | Note Added: 0011193 | |
Mar 21, 2011 5:15 pm | Chris_McElligottPark | Note Added: 0011196 | |
Mar 21, 2011 5:28 pm | Fleet | Note Added: 0011198 | |
Mar 21, 2011 5:31 pm | Chris_McElligottPark | Note Added: 0011199 | |
Mar 22, 2011 5:15 pm | Chris_McElligottPark | Note Added: 0011231 | |
Mar 22, 2011 5:19 pm | TechSY730 | Note Added: 0011232 | |
Mar 22, 2011 5:19 pm | TechSY730 | Note Edited: 0011232 | |
Mar 22, 2011 5:20 pm | Chris_McElligottPark | Note Added: 0011233 | |
Mar 22, 2011 5:20 pm | Chris_McElligottPark | Note Added: 0011234 | |
Mar 22, 2011 5:24 pm | Fleet | Note Added: 0011235 | |
Mar 22, 2011 5:30 pm | Chris_McElligottPark | Note Added: 0011237 | |
Mar 22, 2011 5:34 pm | TechSY730 | Note Added: 0011238 | |
Mar 22, 2011 5:34 pm | TechSY730 | Note Edited: 0011238 | |
Mar 22, 2011 5:34 pm | TechSY730 | Note Edited: 0011238 | |
Mar 22, 2011 5:35 pm | Chris_McElligottPark | Note Added: 0011239 | |
Mar 22, 2011 5:49 pm | TechSY730 | Note Added: 0011240 | |
Mar 22, 2011 5:51 pm | Chris_McElligottPark | Note Added: 0011241 | |
Mar 22, 2011 6:22 pm | Chris_McElligottPark | Note Added: 0011247 | |
Mar 22, 2011 6:26 pm | TechSY730 | Note Added: 0011249 | |
Mar 22, 2011 6:29 pm | Fleet | Note Added: 0011250 | |
Mar 22, 2011 6:30 pm | TechSY730 | Note Added: 0011252 | |
Mar 22, 2011 7:42 pm | Chris_McElligottPark | Note Added: 0011255 | |
Mar 22, 2011 8:07 pm | Chris_McElligottPark | Note Added: 0011261 | |
Mar 22, 2011 8:19 pm | Chris_McElligottPark | Note Added: 0011262 | |
Mar 22, 2011 9:35 pm | Chris_McElligottPark | Note Added: 0011270 | |
Mar 22, 2011 9:38 pm | TechSY730 | Note Added: 0011272 | |
Mar 22, 2011 9:39 pm | Chris_McElligottPark | Note Added: 0011274 | |
Mar 22, 2011 9:42 pm | TechSY730 | Note Added: 0011277 | |
Mar 22, 2011 9:44 pm | Chris_McElligottPark | Note Added: 0011278 | |
Mar 22, 2011 9:46 pm | Chris_McElligottPark | File Added: AnalysisLogs.zip | |
Mar 22, 2011 9:48 pm | Chris_McElligottPark | Note Added: 0011279 | |
Mar 22, 2011 9:49 pm | TechSY730 | Note Added: 0011280 | |
Mar 22, 2011 9:49 pm | TechSY730 | Note Edited: 0011280 | |
Mar 22, 2011 9:51 pm | Chris_McElligottPark | Note Added: 0011281 | |
Mar 22, 2011 9:52 pm | TechSY730 | Note Added: 0011282 | |
Mar 22, 2011 9:52 pm | Chris_McElligottPark | Note Added: 0011283 | |
Mar 22, 2011 9:53 pm | Chris_McElligottPark | Note Added: 0011285 | |
Mar 22, 2011 9:57 pm | TechSY730 | Note Added: 0011286 | |
Mar 22, 2011 9:58 pm | Chris_McElligottPark | Note Added: 0011287 | |
Mar 22, 2011 10:02 pm | Chris_McElligottPark | Note Added: 0011288 | |
Mar 22, 2011 10:13 pm | TechSY730 | Note Added: 0011289 | |
Mar 22, 2011 10:13 pm | TechSY730 | Note Edited: 0011289 | |
Mar 22, 2011 10:14 pm | Chris_McElligottPark | Note Added: 0011290 | |
Mar 22, 2011 10:24 pm | TechSY730 | Note Added: 0011291 | |
Mar 22, 2011 10:27 pm | TechSY730 | Note Edited: 0011291 | |
Mar 22, 2011 10:27 pm | Chris_McElligottPark | Note Added: 0011292 | |
Mar 22, 2011 10:55 pm | Chris_McElligottPark | File Added: 0Recalcs.zip | |
Mar 22, 2011 11:09 pm | Chris_McElligottPark | Note Added: 0011294 | |
Mar 22, 2011 11:09 pm | Chris_McElligottPark | Status | assigned => resolved |
Mar 22, 2011 11:09 pm | Chris_McElligottPark | Fixed in Version | => 5.006 |
Mar 22, 2011 11:09 pm | Chris_McElligottPark | Resolution | open => fixed |
Mar 22, 2011 11:11 pm | Chris_McElligottPark | Relationship added | related to 0002480 |
Mar 22, 2011 11:13 pm | Chris_McElligottPark | Relationship added | related to 0002460 |
Mar 22, 2011 11:14 pm | Chris_McElligottPark | Relationship added | related to 0002546 |
Mar 22, 2011 11:17 pm | Chris_McElligottPark | Relationship added | related to 0002685 |
Mar 23, 2011 12:06 am | Draco18s | Note Added: 0011310 | |
Mar 23, 2011 12:07 am | Chris_McElligottPark | Note Added: 0011312 | |
Mar 23, 2011 7:58 pm | Chris_McElligottPark | Note Added: 0011382 | |
Apr 14, 2014 9:27 am | Chris_McElligottPark | Category | Bug - Crash or Exception => Crash/Exception |