Over the past few weeks we’ve been working on making the game better, based on what was reported through crash logs, forum posts and emails coming in from helpful customers. We’ve fixed a few (rare) crash issues, and there are a few more fixes coming soon.
One of the more ‘interesting’ problems we heard of through emails were network connection issues on the Playstation 4.
Our game uses a peer-to-peer ‘mesh’ setup, which means every client needs to be able to talk to every other client. The mesh starts by one person starting in ‘host’ mode. Then up to two other players connect to this host, and to any other players already in the network.
Occasionally it turns out that while players can connect to the host, they cannot get any messages through to the other player in the game, even though the NAT negotiation library seems to indicate it can. I’m at a bit of a loss as to why exactly, since we have never seen this exact behavior in our test setups. It probably has something to do with firewalls/NAT traversal, but it’s hard to say exactly if we can’t reproduce this problem in-house. Interestingly enough, we’ve never had complaints from our Steam customers, but that is likely because Steam has their own routing-service that automatically sends messages through a ‘known’ steam server if it can’t send directly.
After a bit of investigation on what to do about this problem, I decided to send each client-client network packet directly to the remote client (if a connection is available) as well as through the host. The one sent to the host is marked as a ‘routing’ packet, meaning the host is expected to forward it to the other client. On the receiving client, because of the double message send, we now often see two of the same packets coming in at different times. This is not a problem because we automatically reject duplicate packets based on the packet ID.
There are two downsides to this approach: Double the latency, and double the network bandwidth. Luckily our game was designed to be very latency resistant. We almost always test with 250-350ms of lag in our test setups. Even with 350ms of lag, the lag is only really noticeable occasionally, and usually not really a problem.
The bandwidth of our game is quite low. The total host bandwidth with two connected clients is about 30kbps on average, which is quite low compared to most modern games. Because of the bitpacked network packets, and the use of the ‘data-blobs’ which use binary delta compression, the bandwidth is kept to a minimum. So doubling the bandwidth to around 60kbps is still well within the acceptable range.
For a minute I was thinking I should just not send client messages directly to other clients at all, and always route them through the host. This would increase latency, but the bandwidth would remain the same as before (30kbps). But because the bandwidth is already so low, I decided to always send two packets so we get lower latency for clients that are able to communicate.
I could probably improve it further by only sending messages through the host when it is detected that the client-client connection isn’t working, but that’d introduce a time frame where the game would feel ‘locked up’ until the messages were finally routed through the host, which would create a pretty bad user experience.
I might change it around for other games, depending on the requirements, but for now I think the current routing system will fix the issues with the PS4 networking. We’re submitting a patch soon, but we have lots of testing to be done to make sure it’s robust first!
Until next week, keep those packets routing!
As usual, today at 4pm there will be another Dev Stream with Jouste the Drawbarian. Don’t miss it!
Welcome back, followers of the fearsome! Nick here, with a bit of info on the tech stuff going on behind the scenes here at Slick Entertainment.
This week I’ll be talking a bit about supporting 4k TV’s in Viking Squad. As you probably know, the regular PS4 supports up to 1080p, which is the resolution we initially designed our game for. 1080p should be enough for everybody, right? Recently 4k monitors and video cards have become more affordable, so naturally there is more of a demand to support 4k resolutions. So it was about time I started looking into what would be required to get our game to run at 4k, in a way that makes it ‘worth it’.
The nice part about our user interface (UI) system is that it is able to work with different screen aspect ratios and sizes, because the UI elements are not linked to exact pixels, but rather to more abstract coordinates. For some games pixel-precise UI is super important (for example RTS’s), but for our type of games with not-so-complicated UI it’s less important. The nice thing about linking UI to abstract window coordinates is that it’s quite easy to scale to different resolutions. So switching to 4k shouldn’t be too hard, right?
So, the first step was to actually get a 4k monitor. I bought an ASUS 4k monitor at NCIX, and connected it, but it turned out that monitor didn’t support HDMI 2.0. This, I found out the hard way, means it will only do 4k resolutions at 30fps. Unacceptable! So I checked Staples, and they had a 4k Samsung monitor for sale for $550 CAD, and it had HDMI 2.0 support. That’s more like it!
I set it all up, connected the monitor, ran the game, and boom 4k! It just worked. Well, sort of. There’s a few little gotcha’s that I solved to make it run smoothly at 60fps in 4k.
4k screens, or 2160p as it really should be called, means you need to fill 3840×2160 pixels = 8,294,400 pixels. That’s a LOT of pixels. We’re using a 32 bit color buffer, which takes up just under 32Mb. That’s a LOT of GPU bandwidth. My old programmer brain thinks “this can’t possible be fast” every step along the way. Modern GPU’s have surprisingly little issues with it though! Of course, our game’s rendering complexity is not anywhere near a AAA type game, but it’s still pretty easy to miss frames at 4k because of the massive amounts of data flying around.
One frame in Viking Squad is rendered as follows:
Step 1: Background. All background elements are rendered to a render-target 2/3’s the size of the front buffer, which is then blurred vertically and horizontally, in two steps, requiring one more render target of 2/3s the size of the front buffer.
Step 2: Foreground. All foreground elements are rendered to a render-target 2/3’s the size of the front buffer, also blurred vertically and horizontally
Step 3: Game. The main view is rendered using a full size render target (same size as the front buffer). First the background texture from Step 1 is drawn, then all the gameplay elements are drawn over top of that, and last the texture from Step 2 is rendered over top of everything using alpha blending.
Step 4: Bloom. The texture from Step 3 is down-scaled and to a render target of 1/2 the size of the front buffer. This texture us blurred horizontally and vertically using one more render target of 1/2 the size of the front buffer.
The texture coming out of Step 4 is now used in the UI system to render the game, and it then renders any UI over top of this.
So as you can see, there are quite a few render targets, which take up a lot of GPU bandwidth. The textures in Step 1, 2 and 4 are all blurred, so to speed up the 4k performance, I made those render targets max out at 1280×720 (which happens to be 2/3 of the front buffer at 1080p). Because of the blur, this reduction in size is hardly noticeable, except for the increased frames per second.
Polygons and Textures
I’ve written a little bit about how we polygonize our textures, and this came in really handy when rendering to 4k. The edges of our characters stay super crisp, and while we need MSAA at 1080p, at 4k resolutions the MSAA can be lowered a bit to save speed and bandwidth.
Inside the polygonized textures we still display the regular texture. I’ve also previously blogged about how I rescale all textures to 1080p resolution. Jesse usually draws all textures at a bit higher resolution than they appear in 1080p, so to get nice crisp textures in 4k, I just turned the re-scaling off.
And here is the result of all this:
Alright, that’s it for this week, keep those games rendering at 60fps!
This week I fixed a problem that has been popping up in our game every once in a while. Our game normally runs at smooth 60 frames per second, but every once in a while the game would start to stutter and just generally become very un-smooth, man. The stuttering would last a few seconds and then the game picked up again and was buttery smooth once more. I’ve got some profiling routines in our game to measure what is going on, but every time I started profiling when this effect would show up, it would magically disappear, and the profile data would show that we’re perfectly within the desired frame times. So what is going on?
Our main game loop waits for the video buffers to swap, and then it checks how much time has passed to determine how many times to step the game. A time-accumulator float value is used to keep track of the time elapsed since the game was stepped, which is a pretty basic practice in most games. It guarantees that the game is stepped at 60hz on average, over multiple frames. The general code looks something like this:
staticconstfloat GameTimeStep =1.0f/60.0f;
void Game::Update(float dt)
m_timeAccumulator += Util::Clamp(dt, 0.0f, 0.1f); // Clamp the value to prevent spiral of doomwhile (m_timeAccumulator >= GameTimeStep)
m_timeAccumulator -= GameTimeStep;
So in this code, m_timeAccumulator is the float value in the Game class that is increased by the delta time every frame, and then the game is stepped however many times to make sure the game gets simulated 60 times per second. This piece of code handles variable frame rates quite well, if the delta time dt is 1/30th of a second, the game will just step twice in a row, to maintain the 60x per second simulation rate.
(Oh, a small note about the spiral of doom. This happens when the game takes longer to step than ‘real time’. By capping the incoming delta time, we potentially slow down the in-game time, but at least the game doesn’t grind to a halt.)
Ok, we’re all good then, right? The code above should fix all our problems, no? Well, not exactly. The code above has one problem that is a bit hidden, and it only really shows up if the render rate is the same as the game step rate AND your time accumulator is very close to a multiple of your game step time. For example, in our game the problem only occasionally shows up when running in a mode that is 60Hz.
Hopefully the graph below will make it more clear:
Lots of info in the graph above, so here’s some explanation. The black line at the top shows the perfect 60x per second game update time we want to have. The green dots are measured frame times. We measure our elapsed frame time directly after a the swap-buffers call, using the highest resolution timers available. I have found that despite using the most precise measuring routines, there will still be a bit of variation. In the graph you can see that the green dot are somewhat randomly offset from the perfect rate we’d like to have, this is the variation. And this variation is what is causing the problem I’m talking about. Sometimes the measured frame time is a bit before, sometimes it’s a bit after the actual time.
The perfect amount of game steps to take every frame in this case (60Hz render, 60fps game step) is of course 1. In fact on the console games I’ve shipped, I sometimes just hard-coded one game step per render frame, but on PC’s with wildly varying video modes and video cards, this isn’t viable. The preferred game steps are represented by the blue outlined boxes for each rendered frame. However, using the code above, the actual game steps done per frame is shown by the magenta boxes. This shows that in the first render frame, the game isn’t stepped at all, and the next render frame steps the game twice to make up for it. Then there’s another frame of zero game steps, etc. In other words, very un-smooth behaviour.
Now, how did I fix this? Well, in our game it doesn’t really matter too much if the game step is running at exactly 60 fps, or if it’s running at a slightly higher rate. So, I introduced a fuzz factor:
The fuzz area is basically a time difference that is ‘acceptable’ when determining how many times to step the simulation. It’s only really used ‘downwards’, so for example if your game step is 10ms, and you get a frame time of 9ms, you might still deem that acceptable, and step the game. If your frame time was 12ms, then the fuzz factor isn’t technically needed, and the game is stepped.
Note that the frame steps are exactly where we want them! Buttery smooth. The game update code now looks like this:
The fuzz factor is set to 0.96, which basically allows the game to speed up from 60Hz to about 62.5Hz.
Since I put in the code above, the occasional stuttering when running in a video mode of 60Hz seems to have disappeared from our game. When running at different frame rates such as 50Hz the game will still ‘catch up’ every once in a while causing a one-frame jitter. This is distracting and not preferable, but I don’t currently know of a way to fix this other than running the simulation at a much higher rate to get more granularity (not really feasible in our case), or by interpolating everything in the renderer (which might introduce frame lag, which is also undesirable).
Anyway, I am happy this problem is no longer an issue, and thought I’d share the idea if you’ve seen problems similar to this. Yay, one step closer to shipping! :)
Alright, that’s it for this week, keep those frames buttery smooth!
As usual, today at 4pm there will be another Dev Stream with Jouste the Drawbarian. Don’t miss it!
This week is a follow up to Dev Blog 197, which was the first dev-blog on networking. I’ve been working on getting the networking system up and running for Viking Squad, and there are a ton of little peculiarities I could talk about, but I’ll start with a bit more of the basics, in particular about how to squeeze all the unused bits out of your precious network bandwidth.
After setting up the low level networking system described in Dev Blog 197, we are now able to send and receive messages from remote clients. These messages get bundled up into a network packet and sent, only to be broken into individual messages again when they reach the other side. Now, bandwidth is always a concern, so we want to try and minimize the packet sizes, which means minimizing the size of the message that get sent.
If you’re familiar with C#, you’re probably aware of the BinaryReader and BinaryWriter classes to save and load binary data. These are constructed with a stream, and then you can read and write different data types to these streams.
For example, say we have this hypothetical entity that we need to save to the network. (Note: this isn’t what our internal entity class looks like, this is just an example to illustrate what I am talking about). This is what the code would look like:
private UInt16 m_entityID; // Unique ID to identify the entity byprivate Vector3 m_position; // 3D Position of the entity in the worldprivate Vector3 m_velocity; // Velocity of the entity in the worldprivatebool m_onGround; // Boolean saving if this entity is on the ground or not.// .. More data herepublicvoidSaveToNetwork(BinaryWriter writer)
writer.Write(m_entityID); // 16
writer.Write(m_position.X); // 32
writer.Write(m_position.Y); // 32
writer.Write(m_position.Z); // 32
writer.Write(m_velocity.X); // 32
writer.Write(m_velocity.Y); // 32
writer.Write(m_velocity.Z); // 32
writer.Write(m_onGround); // 8// 216 bits total = 27 bytes
m_entityID = reader.ReadUInt16();
m_position.X = reader.ReadSingle();
m_position.Y = reader.ReadSingle();
m_position.Z = reader.ReadSingle();
m_velocity.X = reader.ReadSingle();
m_velocity.Y = reader.ReadSingle();
m_velocity.Z = reader.ReadSingle();
m_onGround = reader.ReadBoolean();
As you can see it needs 27 bytes to save the state to the network. The smallest unit of data that can be written using BinaryReader and BinaryWriter is a byte, which is actually quite large if you think about it. For example, when the SaveToNetwork function writes the m_onGround boolean value, it will write an entire byte (8 bits) to save the state of the boolean. That’s 7 wasted bits to write 1 bit of actual data!
Now, how about we take this one step further? How about saving all values using exactly the amount of bits we think it needs? And in our loading code, we use the same number of bits to read the value and store it in our variable. To make this easy to do, I implemented my own BitStreamReader and BitStreamWriter classes. These have the same functions to read and write data types, except they also require a number of bits, and in some cases a minimum and maximum value. Internally they pack data in bit by bit, making it possible to waste no bits when saving the entity state. If you’re interested in the internals of these streams, just click here to see what the C# code looks like.
To be able to save the maximum amount of space, we need to know what the limits are for each of our variables. The m_entityID is a 16 bit value, but we don’t see it ever going over 1024. In this example, we’ll assume that the X Coordinate is always between -150.0f and 150.0f, and the Y and Z coordinates are always between -6.0f and 6.0f. The velocity is always between -10.0 and 10.0.
Now how do we save this efficiently? Well, lets go through them one by one:
m_entityID: This value will never go over 1024, so we can save this unsigned integer by using only 10 bits.
For m_position, we’ll need to be able to save the floating point values using a specific amount of bits, while retaining a minimum precision. Say we determine we want a precision of about 0.02 units. To be able to save the X coordinate we would need to divide up (150.0 – (-150.0)) = 300.0 into 300.0 / 0.02 = 15000 parts. The nearest larger power of two would be 2^14 = 16384. So we’d need 14 bits to save the float and get a precision of 0.01831. For the Y and Z coordinates we do a similar calculation, and come to the conclusion that we can save using 9 bits, giving us a precision of 0.02343.
For the m_velocity we decide we don’t need as much precision, and we can handle 0.1 unit of precision. This allows us to save the X, Y and Z component using just 8 bits each.
The m_onGround just needs one bit to save, since it’s just a true or false value.
The code would look something like this:
private UInt16 m_entityID; // Unique ID to identify the entity byprivate Vector3 m_position; // 3D Position of the entity in the worldprivate Vector3 m_velocity; // Velocity of the entity in the worldprivatebool m_onGround; // Boolean saving if this entity is on the ground or not.// .. More data herepublicvoidSaveToNetwork(BitStreamWriter writer)
writer.Write(m_entityID, 10); // 10
writer.Write(m_position.X, -150.0f, 150.0f, 14); // 14
writer.Write(m_position.Y, -6.0f, 6.0f, 9); // 9
writer.Write(m_position.Z, -6.0f, 6.0f, 9); // 9
writer.Write(m_velocity.X, -10.0f, 10.0f, 8); // 8
writer.Write(m_velocity.Y, -10.0f, 10.0f, 8); // 8
writer.Write(m_velocity.Z, -10.0f, 10.0f, 8); // 8
writer.Write(m_onGround); // 1// 67 bits total = 8.375 = 9 bytes
m_entityID = reader.ReadUInt16(10);
m_position.X = reader.ReadSingle(-150.0f, 150.0f, 14);
m_position.Y = reader.ReadSingle(-6.0f, 6.0f, 9);
m_position.Z = reader.ReadSingle(-6.0f, 6.0f, 9);
m_velocity.X = reader.ReadSingle(-10.0f, 10.0f, 8);
m_velocity.Y = reader.ReadSingle(-10.0f, 10.0f, 8);
m_velocity.Z = reader.ReadSingle(-10.0f, 10.0f, 8);
m_onGround = reader.ReadBoolean();
Now, what we ended up with is a routine that can save the entity data with the precision we needed, in just 9 bytes. That’s 1/3rd of the original implementation! Of course this is a hypothetical entity, but you can see here that there are huge space savings to be had if we know the constraints of our variables.
Alright, that’s it for this week, I need to get back to more network implementation. Writing this blog post made me realize how many little interesting things I did to make the networked game work better. It’s probably worth another future blog post or two!
As usual, today at 4pm there will be another Dev Stream with Jouste the Drawbarian. Don’t miss it!
This week I’ll talk about a cool little visual effect I put in the game: Light shafts. It’s a pretty standard visual effect, so it’s definitely nothing ground breaking, but it does make the game look that much better for very little effort. It was very quick to implement (read: a nice distraction from the monstrous networking task :) ), and I really like the effect.
A while back on our twitch stream I worked on implementing a compute shader to calculate proper volumetric fog, much like the amazing Assassins Creed volumetric fog. I abandoned this approach because it was actually pretty difficult to make it look good in a side scrolling brawler. It was hard to control the fog, created a ton of weird blend mode issues (especially with additive blending) and it also raised the minimum spec because of the compute shader.
Instead of proper volumetric lighting, I’ve opted for a manually placed animated light shafts. This gives us more control over the look, and it also works well in our existing chunk building system.
Last weekend I was playing through Brothers, a tale of two Sons (which is a gorgeous looking game by the way), and they use light shafts in multiple places to create a really cool atmosphere. I wondered how to implement these, as they fluctuate the light over time, yet are always smooth. This blog is about what I came up with.
The light shaft is rendered using alpha blending, using a user-defined color, and using a calculated alpha.
The geometry is 4 triangles, with coded UV’s. the U coordinate is used as the actual U coordinate in the noise texture, and the V coordinate is just used as an alpha multiplier.
The alpha is the product of the user-defined color’s alpha, the sample from the texture, the V coordinate (used as alpha multiplier), and the edge smoothing alpha. The edge smoothing is simply using the (0-1) U coordinate as a parameter for a sine function in the shader to generate a multiplier between 0 and 1 that nicely blends the edges to zero.
The texture coordinates used to render the light shaft is using a thin horizontal sliver of the texture (stretched over the mesh), and is animated over multiple frames to generate the modulation effect in the light shaft.
Here’s a video of the effect in the game:
That’s pretty much it! Simple, but effective. I love putting things like that in the game.
Oh, and as a reminder, we moved our weekly dev-stream to Friday, because Jesse is still in Japan.
Welcome back followers of the fearsome! This week we’ll be checking out some of the super cool new additions for the world map! The world map is going to be getting some neat events that happen on levels the player has already completed. These events will range from new treasure opportunities to dangerous new bosses. […]