On limitations of human interface devices (HIDs)
What would be the limiting factor when playing an arcade style space shooter via sensors and a hooked up keyboard?
This post is a copy of the gist I posted in response to the WeekendDevPuzzle of 2022-02-12, shown below.
As the thread above has submissions from people who chose to share their perspectives on the puzzle, I'll be linking this post to this Twitter thread, to avoid polluting the original one.
Motivation for today's topic
Every piece of technology gets engineered for a certain set of characteristics, e.g. the vehicle that you use, was designed with a certain typical & max capacity in mind. Same is true for the devices we use to interact with computing systems, be it your workstation, your laptop, or your smartphone. But how often do we reflect on those design characteristics?
Today's puzzle is about throwing some light on these devices, with the hope that it leaves us more informed.
Dissecting the puzzle
Flow of information
So, we have an arcade style game being played on a computer. Clearly, the flow of information would be something like this:
- CPU of the computer calculates its game model (which alien ship, or bullets, are at what position etc) & tells the GPU to draw.
- The GPU uses this model information received from CPU to paint a frame of the picture, and sends it over the display cable to the screen.
- The screen uses these rapidly received frames to tell all the pixels to change themselves to their new values.
- Our sensors observing the screen observe these pixels (or blocks of pixels) to change, and send them over the wire to our program (sitting in a different computer).
- Our program does some calculations & determines what steps to take, e.g. pressing the left key 4 times. This is fed as an electrical signal to some wires we've attached to the keyboard keys to electrically press a key.
- The input is received by the OS and fed to the game program, looping back to step 1.
Potential areas of bottleneck
Looking at the above, potential areas of bottleneck would pretty much map 1-1 with the primary actor in each of those steps. But we can make some simplifying assumptions to narrow them down. e.g. given that the game mechanics are quite simple (old arcade style), we can make an informed guess that CPU+GPU of the computer are not a bottleneck. This assumption breaks down if the computer is quite old, but let's stick with this for now.
Likewise, we can assume that the external sensors we've put in front of the screen are not a bottleneck because we can use the fastest possible sensors, and other optimisations, which allow us to keep them as fast as possible.
That leaves us with the following areas:
- Latency of the monitor screen
- Latency of the keyboard
- OS latency when receiving inputs. This is the time when the OS detects a keyboard input to the time it sends it to the application. Again, assuming a fast enough computer, we'll ignore this for now.
- Bot/Human latency introduced by the time difference b/w the sensor observing something vs our logic controller (on which our code is running) making a decision by sending a keyboard event. We can assume this to be reducable to the order of microseconds, given that it's a fairly simple set of logic, thanks to the simple game mechanics.
Let's analyse the first two in more detail.
Screen latency
When we're using any screen, there are three parameters that become important to us in this scenario:
- Response time. This is a measure of how fast pixels can flip (typically measured in milliseconds). Actual number depends on the nature of tech used for the screen. You can read here for comparison between different technologies. For our calculations, we'll consider two scenarios: 1ms (possible in today's gaming monitors) and 10ms (typical of IPS monitors).
- Input lag. This is a measure of time delta b/w the time that a signal was received to the time it's converted to signals for pixel flipping. This can vary from microsecs to several 100 ms if too much image processing is going on (this is why TVs often have a "game mode" - it switches off this processing). We'll assume this to be zero, assuming that we can turn it off in our hypothetical scenario too.
- Refresh rate. Closely related to response times. Essentially captures how many images/sec can be displayed. So, a 60Hz monitor can paint the whole screen 60 times per sec. This matters to us, because this effectively adds latency to our sensor, as our sensor can read only what has been displayed. So no matter how fast our sensor operates, we'll be limited to say 1000/240=4ms latency in our sensor for a 240Hz screen. For our analysis, we'll assume two scenarios: 60Hz (typical LCD) and 360Hz (extreme gaming screens). Before you say wtf to the number 360, please read this.
So, our scenarios are:
- Latency added due to response time: 1ms, 10ms
- (Effective) latency addition in sensor due to screen refresh rate: 16ms (for 60Hz), 2.7ms (for 360Hz).
Keyboard latency
You might be forgiven if you thought that every time you press a key on your keyboard, you raise an interrupt. Indeed, at one point in time, that's how PS/2 ports for keyboard & mice used to work. But modern keyboards work on USB, which doesn't rely on interrupts, but instead a polling from the OS. This poll rate depends upon the device, the USB negotiation done, and driver settings. Higher polling rates require better keyboard USB controller, and have a higher power draw, but give better lantecy.
For our scenario analysis, we'll use values of: 10ms (typical), and 0.125ms (extreme).
Remember that wireless keyboards are also a factor. We can consider two kinds of wireless:
- Bluetooth - where the keyboard is directly talking to BT of the computer. These have horrible latencies (20-80ms), so we'll skip these entirely.
- USB dongle in computer, and this dongle talks wirelessly to the keyboard. Latencies here are always going to be less than or equal to a USB wired keyboard, depending upon the quality of wireless h/w, so we'll skip this scenario as well, and instead just use wired USB keyboard for our analysis.
Analysis
We can breakdown all the factors as follows:
Screen Response Time | Screen Refresh Rate | Keyboard Latency | Limited By |
---|---|---|---|
1ms | 60Hz (16ms) | 10ms or 0.125ms | Screen Refresh Rate |
1ms | 360Hz (2.7ms) | 10ms | Keyboard Latency |
1ms | 360Hz (2.7ms) | 0.125ms | Too close to call |
10ms | 60Hz (16ms) | 10ms or 0.125ms | Screen Refresh Rate |
10ms | 360Hz (2.7ms) | 10ms | Screen RT/Keyboard |
10ms | 360Hz (2.7ms) | 0.125ms | Screen Response Time |
Simplistic much?
The astute amongst you would've noted that the assumptions we've called out till now are not sufficient. There's also the reality that our sensor can observe multiple frames before making a decision (assuming that the alien ship's attacks take a few frames to reach us). So in that sense, we should really be dividing the (screen response time + screen refresh rate) by a factor (say 10, assuming we can afford to lose 10 frames before making a decision), and the answer would look quite a bit different. But in the interest of keeping this short, we'll ignore this.
Conclusion
As always, the answer to the puzzle was dependent on the exact configuration chosen. A bunch of folks who responded had the right intuition behind this, largely from their gaming b/g I suppose, but I hope some of this was still new information. Personally for me, while setting up this puzzle, the 360Hz monitor was a bit of a revelation :)
If you're interested in what others had to say, or you've got a comment to share, head over to the gist of this post.