07-08-2011, 10:18 PM | #1 |
Nade Whore
Server Owner
Beta Tester Join Date: Sep 2007
Location: Oklahoma
Class/Position: Scout/Soldier Gametype: CTF/TDM Affiliations: blunt. Moto Posts Rated Helpful 128 Times
|
Lag Troubleshooting Guide
This was taken from Nuclear Fallout forums because it's the best information I've found so far regarding rates, lag, etc. Credit goes to Edge100x, NFo's founder.
Introduction At Nuclearfallout, we effectively prevent most server-side lag by using the best possible bandwidth and running our servers on high-end hardware with very low load, but some types of lag still occur despite our best efforts. The goal of this guide is to comprehensively describe all types of lag on HL1, Source, and Orangebox-based servers, including what causes each kind and how each can be addressed. This guide also touches on topics that apply to other games, but most other games do not have a similar tool to net_graph and so it can be harder to diagnose them; in addition, other games use different variables to specify rates and other settings. This guide is structured to have three main sections: Symptoms, where we describe the what clients might see, and refer to possible causes; Causes, where we talk more about each cause and how it can be addressed; and Tools, where details on various lag-related tools and commands are provided. Symptoms High latency High latency is probably the simplest type of lag. It manifests itself as predictably higher pings on the scoreboard and in net_graph output. If it is only happening for some clients, see: Cause: Routing path: It is likely that the routing to a particular client or group of clients, such as all of those on a particular ISP or in a particular geographical region, has temporarily deteriorated. If it is happening to all clients, without exception, see: Cause: Other: You'll likely have to troubleshoot this one with us. You might also want to look at Symptom: Ping spikes, if that better describes the situation. Ping spikes Ping spikes, or latency spikes, are when the ping to a client or group of clients suddenly jumps briefly, then settles back down. If it is only happening for some clients, see: Cause: Routing path: If this occurs randomly, it is likely related to the routing. Cause: Rate settings: If the spikes are happening during particularly intense firefights, at the start of a round, or another similar event, then it is possible that the client connection is having problems, and adjusting rates may help. If it is happening to all clients, without exception, see: Cause: Plugins: Plugins can cause server delays that lead to increased processing time. This is especially likely if the spikes are associated with a certain event in the server. Cause: Rate settings: If the server needs to send out more data to everyone than it can, this may cause choke and other problems. Cause: Server FPS dips: This is unlikely but sometimes occurs. Cause: Other: There may be a machine or bandwidth problem at the server's location. Choke Choke can be seen directly in the net_graph 3 output for a user. Most of the time, the game client can compensate for choke (much more easily than it could compensate for completely lost data), but the feel of the gameplay may suffer if the choke gets too high. Choke happens when the server intentionally withholds a snapshot it would normally send to the client because it estimates that doing so would exceed the negotiated "rate" between the server and the client. The server's estimate is not always perfect, so it is common to see a choke of between 0-3, even on the most well-tweaked configurations. The negotiated "rate" is also limited by the game client, so sometimes choke will happen on large servers running at high tickrates regardless of the setting used. See: Cause: Rate settings: A too-low client "rate" or server "sv_minrate"/"sv_maxrate" is the most common cause of choke and this should be investigated first. When the "rate" is too low, typically a client will also see the "in" snapshots per second drop below the the updaterate (in net_graph 3). Cause: Plugins: Plugins can cause spikes of bandwidth usage that lead to the negotiated rate being exceeded regardless of how high it is, and this should be investigated if rate adjustments don't help. Cause: Server FPS dips: If the server FPS is severely dipping, the game will have difficulty estimating achieved rates. Skipping Skipping is when a client feels he is warping from one point to another, after running and playing smoothly for some time. The possible underlying causes closely mimic those of ping spikes. If it is only happening for some clients, see: Cause: Packet loss: This is the most likely cause. Cause: Rate settings: A very low cmdrate or too-high rate or cl_updaterate can also cause skipping. Cause: Low client FPS: Sudden client FPS dips can give the game this feel. If it is happening to all clients, without exception, see: Cause: Plugins: This is the most common cause. Cause: Rate settings: If the server needs to send out more data to everyone than it can, this may cause choke and other problems. Cause: Server FPS dips: This is uncommon, but possible. Cause: Other: Sometimes machine problems or a DDoS attack can play a role. Low numbers in the net_graph This isn't a real symptom per se, since it does not necessarily mean there is a performance problem, but we do hear concerns from renters that some part of their net_graph output shows different values than they expect. In most cases, the concern relates to the "in" or "out" last packet size number, and a lower number for these is actually a good thing. Our net_graph guide at the bottom of this document shows what the numbers mean in detail. See: Tools: net_graph Poor registration Poor registration is a client's sense that fired shots are not hitting their targets in the game. Registration has also further expanded its meaning recently to become a catch-all to describe how a server feels to a player. This phenomenon is often associated with at-rest hitboxes. However, all game servers of the same type use exactly the same hitbox locations in relation to stationary player models, and there is no way to change them. If you are standing still and shooting at another player who is standing still, you will get the same result, on every server (if you do it in exactly the same way and you factor out the simple shot spread randomness that the game adds in). When a client is moving, however, the physical location of hitboxes and therefore the feel of the game change, depending on the server FPS setting, tickrate setting, his latency to the server, his client FPS, the updaterate, cmdrate, interp, and a whole host of other factors. All of these relate to how the game adjusts for dynamic client and server conditions. Valve talks more about this on its wiki. Many times, players are accustomed to different server FPS rates or latencies, and as they play, their brain expects for the game to compensate for those in a certain way -- it expects for events to happen with certain delays, causing the player to lead his shots and so on. By adjusting factors like the server's FPS, you can try to make it more like more familiar servers, and therefore make the "feel" more like certain players expect. When troubleshooting, first look for other measurable symptoms and investigate those concerns first. Sometimes, for instance, poor registration simply involves "Choke", and treating the choke can be a very effective cure. Next, check out the server's observed FPS, as dips in this could make the server's performance unpredictable and also cause a sensation of poor registration. You shouldn't see many FPS dips with our servers, but if you do, we want to know about them ASAP. Then, look at the overall acceleration/peak FPS level. Many players notice a difference between 250 FPS, 500 FPS, and 1000 or 2000 FPS servers. If this is the case, you should consider raising the server's acceleration level through our order page, or lowering it through the "Server control" page (if the server is ultraacelerated). Then, look at client latencies. If some players seem to have high latencies for your location, contact us to have us look at their routing. If the routing is as good as it can be and pings still seem high, consider changing the server's location. Consider straight "Unfamiliarity" as the cause, as well. Finally, if none of the other factors seem to apply, you may just be looking at having an unlucky day. Every shot fired in the game has a certain randomness to it, and if you miss several in a row, it can make it seem like the server is performing below par -- even when it's not. See: Symptom: Choke: If you are seeing choke. The order page: If you think a higher acceleration level might help. Cause: Server FPS dips: The server FPS may be dipping. Cause: Low client FPS: A low client FPS rate makes the game play poorly in general. Cause: Routing path: Have us look at your routing to see if InterNAP can tweak it. Cause: Unfamiliarity: Otherwise. Causes Routing path Problems with the routing (or forwarding) path to or from the server can lead to problems in-game, most notably high pings. If a router is making a poor forwarding decision (for instance, one that sends traffic across the US and back) on the outbound, we can often either directly reroute around it or have InterNAP make an adjustment that avoids the troublesome route. For inbound problems to certain locations, we can also sometimes work around the issue in creative ways. To troubleshoot the routing path, we primarily use the "tracert" command. If you believe that your cause might be due to routing, please follow the instructions under "Tools: Traceroute and MTR" and send us a copy of your trace to the server, along with your IP address. We will investigate the situation thoroughly and let you know what we can do to help. Often routing problems become worse in peak evening internet-surfing hours, when ISPs are more likely to overload their peering connections. Packet loss Packet loss is a serious problem that can have a variety of underlying causes. Many are client related, and in these the issue will occur everywhere, not just our servers. The possibilities include: The client connection may be overloaded. Make sure that there are no programs running in the background and downloading things, such as Windows update or peer to peer software. If the connection is being shared, unplug the other devices temporarily to see if they are contributing to the overall usage. The client might be on a wireless or otherwise unstable connection. Try switching to a physical or more stable connection. The client's router or modem might be having problems. Reboot the router or modem. If the problem continues, consider calling the ISP. A router or peering point may be dropping packets This is usually accompanied by high latencies; see "Causes: Routing path" to troubleshoot. If the problem is happening for everyone on the server, contact us right away. Rate settings Having proper rate settings on both the client and server is very important. These are the primary categories of commands you should look at. "rate"-related commands: These commands determine the negotiated data rate, in bytes per second, allowed between the client and the server. If the negotiated rate is too low, then a client may experience choke or reduced updates from the server, leading to poor performance. If the negotiated rate is too high, the client connection may become saturated, leading to packet loss and skips in the game. (A too-high rate should only be a problem for slow links as through a modem or ISDN line.) rate - This is a client-side command that is used to request a particular rate be used. (Really, it is setting a maximum cap on the rate, as the server may not need to send that much data.) You should generally start by setting this very high, such as at 50000, and reduce it only if you see packet loss or skipping. sv_maxrate, sv_minrate - These are the server-side versions of the "rate" command and determine what ranges of rates the server will allow. If a client's "rate" setting is outside these bounds, it is negotiated to match one of them -- for instance, if the "rate" were set to 10000 and the server's sv_minrate were 15000, the negotiated rate would be 15000. In general, because your server bandwidth is essentially unlimited, you should set sv_maxrate to 0 so that it is unrestricted. sv_minrate is something of a kludge to compensate for clients who have miss-set their "rate", but comes up more and more because of customers using the default, which is far too low; if your clients are complaining of choke, try raising sv_minrate to 50000 or 100000. "updaterate"-related commands: These commands determine the negotiated number of packets per second send to a particular client. A high negotiated updaterate will mean that the client will receive more, smaller, packets and potentially experience smoother gameplay, but at the expense of using more bandwidth overall and thus requiring a higher negotiated rate; a low negotiated updaterate will send fewer, larger packets and use less bandwidth overall. cl_updaterate - This is the client-side command that is used to request a particular updaterate be used. It is recommended that you set this to the highest value that your connection can handle (some home routers have difficulty with larger numbers of small packets per second and this may limit you), up to 100. Higher values than 100 will be the same as 100, because the number of updates per second is limited by the number of world recalculations (ticks) per second on the server, for Source servers -- and it will be, at most, 100 -- and by a hardcoded limit of 100 for Half-Life servers. If you have a very high "rate" setting and are getting choke, try lowering your cl_updaterate by a few at a time until the choke disappears. sv_maxupdaterate, sv_minupdaterate - These are the the server-side versions of the "cl_updaterate" command and determine the allowed range. Generally, you will want to set sv_maxupdaterate to at least your tickrate, and let your clients choose what to use. "cmdrate"-related commands: cmdrate is the analog to updaterate, but sets the number of packets per second sent in the opposite direction -- from the client to the server. The cmdrate is also a negotiated value but is further limited because it can never be higher than a client's FPS (it changes dynamically as the client's FPS decreases). If the cmdrate is very low, the client might be seen as moving jerkily to other players. cl_cmdrate - This is the client-side command to request a particular cmdrate. Generally this should be set as high as the connection can allow, up to 100. If the server doesn't seem to be registering all movements and shots, but there is no packet loss displayed, there could be a problem with your connection and you might try lowering it. sv_maxcmdrate, sv_mincmdrate - These are the server-side analogues. The sv_mincmdrate should be set high enough to prevent players from being seen as skipping around -- around 20 should be enough -- and the sv_maxcmdrate should be generally set to 100. Low client FPS A low client FPS can limit the realized cmdrate, can cause the game to skip or jerk, and generally makes playing much less fun. The client FPS is affected by the properties of the map being played, how much action is going on in the game, the type of graphics card and graphics settings, and the speed of the client's CPU. It is exclusive to an individual client and low client FPS never reflects poor server performance; in fact, higher server performance can actually lead to lower client FPS rates because the client will have more data to process. To increase client FPS, a client will likely have to change his or her graphics settings or upgrade his or her hardware. Server FPS dips Server FPS dips are caused by high server load, either because of the same server becoming bogged down (with work from plugins or in the normal course of its operation), or because the machine in general is overloaded. These dips can lead to choke, or, in more extreme cases, to skips and lag spikes. Server FPS can be measured through a game client or any remote rcon tool with the "rcon stats" command. Small dips of up to around 10% are expected and will not have an noticable affect on gameplay, and the occasional larger dip at the beginning of a round or during a map load is normal, but anything outside these should be fully investigated. Plugins are often the culprit when it comes to FPS dips, and that possibility should be investigated first. If the server still experiences dips without plugins enabled, please contact us ASAP so that we can look into whether there is a problem with your machine. A related note should also be made that the fps_max (or sys_ticrate, on HL1 servers) should always be set to at least a third higher than the desired server FPS, if not to 0 (meaning unlimited). Otherwise, a server will intentionally cap its FPS at a lower value than it could be. Plugins Plugins change the behavior of a game server in unpredictable ways, and often they are the culprits when it comes to some of the more obscure lag problems. Some of the ways that plugins can cause problems are: * By causing large amounts of disk access; anything retrieved directly from disk will take at least a few milliseconds to load. This is common with stats plugins that develop large databases, and it can lead to spikes and skips. * By intense calculations; plugins such as bots can often cause a server to spend more than the usual amount of time handling a tick, causing the server FPS to plummet and every client to experience delays. * By waiting on network events; plugins that contact outside hosts and don't return control to the game server until a response has been received are limited by how quickly the other host can respond. The best way to troubleshoot potential plugin problems is by temporarily disabling all plugins and seeing if the problem disappears. If it does, plugins should be reenabled one-by-one until the problem appears again and the culprit becomes known. Unfamiliarity Unfamiliarity with a server's responsiveness or rate-related settings can cause it to "feel" very different in-game. This is because a different latency affects where the game estimates other moving players are located. With a low latency, the game has more up-to-date information and arrives at a better guess, making shots fired directly at another player more likely to hit; with a high latency, the client has older information and is more likely to be wrong in its lag prediction, leading to direct shots that do not appear to land. If a high-level player is accustomed to playing with a particular amount of latency, he or she may subconsciously learn how to slightly adjust the information given to him or her and aim in certain spots (such as dragging behind a target that is expected to stop) that are more likely to hit. This means that even a higher-performing server with lower-than-normal ping can sometimes feel like it is underperforming, simply because the player is not accustomed to it. If there are no other symptoms than a poor game "feel" -- that is, if it has a low latency, no server FPS dips, no packet loss, and no choke -- unfamiliarity is probably the culprit. Other (software bugs, random machine-related incidents, DDoS attacks) Sometimes the game itself has a bug in it that can cause mysterious lag, or we experience a problem with a machine that causes issues for its game servers without showing up on our monitoring systems (our systems continually monitor reachability, CPU usage, memory usage, and bandwidth usage), or a game server is flooded by a Denial of Service attack. These are issues that only we can effectively diagnose with you, so please contact us if you exhaust other possibilities and arrive at this possible cause. We may have you try other servers on the same machine or at your location in an attempt to narrow down the factors. We often find that a simple machine move will help with particularly mysterious issues. Tools Traceroute and MTR A traceroute effectively shows the forwarding path taken to a destination, along with some data about how long it might take to reach each router along the path. It does this by sending a series of packets out with increasing TTL (Time To Live) values. TTL represents how many "hops" a packet is allowed to travel before it is discarded; each time a packet is forwarded on by a router, that router decreases the TTL value, and if reaches 0 the router drops the packet and sends back an ICMP "TTL-expired" response. The traceroute program measures how long it took to receive each of these ICMP packets, along with each IP that responded, and shows it to the user. In Windows, you can run a traceroute with the "tracert" command. We describe exactly how to do this here: viewtopic.php?f=8&t=1539. To make the process simpler, we also have a batch file set up on the "trace" tab of our control panel that will run through the steps for you. This is the example output of the tracert command: Code: Code:
C:\>tracert http://www.nuclearfallout.net Tracing route to www.nuclearfallout.net [206.253.195.193] over a maximum of 30 hops: 1 <1 ms <1 ms <1 ms router.la.nuclearfallout.net [64.94.101.254] 2 <1 ms <1 ms <1 ms border1.ge3-3.nuclearfallout-33.ext1. lax.pnap.net [63.251.209.181] 3 <1 ms <1 ms <1 ms core3.t2-2-bbnet2.lax.pnap.net [216.52.255.67] 4 <1 ms 1 ms <1 ms te1-4.ar4.LAX1.gblx.net [64.215.30.77] 5 26 ms 26 ms 26 ms InterNAP-Ken-Schmid-Seattle.ge-3-2-0. ar4.SEA1.gblx.net [64.213.33.230] 6 26 ms 26 ms 26 ms border25s.gi6-2-bbnet2.sea.pnap.net [63.251.160.106] 7 26 ms 26 ms 26 ms nuclearfallout-5.border25s.sea.pnap.net [206.191.144.42] 8 26 ms 26 ms 26 ms www.nuclearfallout.net [206.253.195.193] Trace complete. C:\> If you run a trace to your game server and it has high pings starting at a particular hop, when that hop appears to be in the same city as the one before it, there may be a problem. For instance, if the trace above looked like this instead, it would indicate that something could be amiss in Seattle: Code: Code:
C:\>tracert www.nuclearfallout.net Tracing route to www.nuclearfallout.net [206.253.195.193] over a maximum of 30 hops: 1 <1 ms <1 ms <1 ms router.la.nuclearfallout.net [64.94.101.254] 2 <1 ms <1 ms <1 ms border1.ge3-3.nuclearfallout- 33.ext1.lax.pnap.net [63.251.209.181] 3 <1 ms <1 ms <1 ms core3.t2-2-bbnet2.lax.pnap.net [216.52.255.67] 4 <1 ms 1 ms <1 ms te1-4.ar4.LAX1.gblx.net [64.215.30.77] 5 26 ms 26 ms 26 ms InterNAP-Ken-Schmid-Seattle.ge- 3-2-0.ar4.SEA1.gblx.net [64.213.33.230] 6 126 ms 126 ms 126 ms border25s.gi6-2-bbnet2.sea. pnap.net [63.251.160.106] 7 126 ms 126 ms 126 ms nuclearfallout-5.border25s.sea. pnap.net [206.191.144.42] 8 126 ms 126 ms 126 ms www.nuclearfallout.net [206.253.195.193] Trace complete. C:\> Traceroutes do have their limitations. For instance, sometimes a router will decrease the TTL but won't respond back, and this shows as a complete timeout: Code: Code:
11 * * * Request timed out. Code: Code:
7 126 ms 158 ms 116 ms nuclearfallout-5.border25s.sea. pnap.net [206.191.144.42] Repeated rcon stats rcon stats shows an estimate of the current FPS (Frames Per Second) that the game server is currently running at, among other data. Through the console, after logging into rcon with the "rcon_password yourpassword" command, simply type "rcon stats" to see the output. For instance: Code: Code:
> rcon stats CPU In Out Uptime Users FPS Players 0.00 0.00 0.00 491 5 253.11 4 "In" and "Out" represent how much data is flowing into and out of the server, in bytes per second. "Uptime" tells how long the server has been online, in minutes. "Users" shows the total number of players who have connected to the server since it was last restarted. "FPS" shows the current FPS rate and should approximately match the lesser of what you chose during the ordering process and what you told the server to run at through the fps_max (Source) or sys_ticrate (HL1) command. "Players" represents the current number of players in the server. With repeated rcon stats queries, you can get a good idea of whether the server is experiencing FPS dips. We recommend running 5-10 in a row, like so: Code: Code:
> rcon stats CPU In Out Uptime Users FPS Players 0.00 0.00 0.00 496 5 256.40 4 > rcon stats CPU In Out Uptime Users FPS Players 0.00 0.00 0.00 496 5 261.38 4 > rcon stats CPU In Out Uptime Users FPS Players 0.00 0.00 0.00 496 5 249.18 4 > rcon stats CPU In Out Uptime Users FPS Players 0.00 0.00 0.00 496 5 245.84 4 > rcon stats CPU In Out Uptime Users FPS Players 0.00 0.00 0.00 496 5 243.94 4 > rcon stats CPU In Out Uptime Users FPS Players 0.00 0.00 0.00 496 5 255.66 4 Code: Code:
> rcon stats CPU In Out Uptime Users FPS Players 0.00 0.00 0.00 496 5 17.98 4 net_graph net_graph is a powerful tool in troubleshooting a server because it combines information helpful in choosing rate settings (such as choke and the in/s and out/s numbers) with information on the health of the client-server connection (packet loss). Net_graph behaves similarly in HL1-based games, Source-based games, and Orangebox-based games; we describe the Source version here. The two main net_graph modes are 2 and 3. To set one of these, simply pull down your client console and type "net_graph 2" or "net_graph 3" and hit the enter key. In a CS:S game, net_graph 2 looks like this: Attachment: Along the top of this output, there is a colorful graph showing the size of each packet; the colors are explained at http://developer.valvesoftware.com/wiki ... Networking. This information is mainly irrelevant to the performance of the server, but it certainly looks pretty. If you experience a connection interruption, you'll see gaps in this graph. On the first line of the output, the game lists your client's current FPS, your ping to the server, and the negotiated updaterate. On the second line, it shows the size of the last inbound packet (aka update or snapshot), the average incoming bandwidth used, and the realized number of updates per second (aka the realized updaterate or realized number of snapshots per second). The realized number of updates per second can never be higher than the server's tickrate, since the server sends at most one update per tick. On the third line, it shows the size of the last outbound packet (aka command), the average outbound bandwidth, and the realized number of commands per second (aka the realized cmdrate). The realized number of commands per second can never be higher than your client's FPS for the same period of time, because the local game only processes input once per frame. net_graph 3 looks like this: Attachment: This is very similar to net_graph 2, but it removes the graph and the information on the negotiated updaterate and negotiated cmdrate, and it adds information on the degree of packet loss and choke (we point out only these two additions in the screenshot). Of the two, we recommend using net_graph 3 most of the time. Choke and packet loss are referenced several times in this guide, and net_graph 3 is the only place to find both of those numbers. Full article here: http://www.nfoservers.com/forums/vie...hp?f=25&t=3930 Last edited by KubeDawg; 07-08-2011 at 10:37 PM. |
|
07-09-2011, 07:05 PM | #2 |
Join Date: Nov 2010
Gametype: Capture the Flag Posts Rated Helpful 38 Times
|
Very nice guide kube, definitely helped me fix my lag
|
|
07-09-2011, 07:41 PM | #3 |
Nade Whore
Server Owner
Beta Tester Join Date: Sep 2007
Location: Oklahoma
Class/Position: Scout/Soldier Gametype: CTF/TDM Affiliations: blunt. Moto Posts Rated Helpful 128 Times
|
Thanks man. It's really helped me figure out what actually does what instead of just blindly setting my rates. I was under the impression 30k was the highest rate that was allowed for source games. Apparently their servers are good enough to have a maxrate set to 100k.
|
|
07-10-2011, 01:59 PM | #4 |
Colorless FTW
D&A Member
Beta Tester Join Date: Mar 2007
Location: A Small Box
Affiliations: SRCDS.com Posts Rated Helpful 1 Times
|
From may EXP, server-side FPS really doesn't do much. I've played on servers with 20FPS and 1000FPS and there's no difference IMHO. Server hosting companies use high FPS as a way of making money. Many companies actually don't make money on normal server rentals, but make a ton of money when then sell one 1000FPS server for $99+ a month...
|
|
07-13-2011, 03:04 PM | #5 |
Nade Whore
Server Owner
Beta Tester Join Date: Sep 2007
Location: Oklahoma
Class/Position: Scout/Soldier Gametype: CTF/TDM Affiliations: blunt. Moto Posts Rated Helpful 128 Times
|
Yeah that's why I hate Gameservers. They're expensive and use higher FPS as a gimmick to give you the illusion you are getting a good deal. At least with NFo, I pay about $25 a month for an 18 man server already at 500FPS. Not a bad deal if you ask me.
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
|
|