Introduction
When we are building web applications, we usually rely on the HTTP protocol and traditional request-response paradigm. This is a well-known scenario where a client sends some data and then a server responds. The client is the one who initiates the communication. This communication is half duplex, meaning that information flow in only one direction at a time. For the majority of web applications, this way of communication is the right choice, but...
What if we need for the server to initiate the communication?
What if we need real-time experience for our application (gaming, a stock market, chat, …)?
What if we have a large number of clients frequently exchanging small messages with the server?
What if we have limited bandwidth at our disposal (IoT devices)?
These are some of the scenarios where the request-response method might not be a good solution and here you can find some of the techniques frequently used to improve the performance of the application when we face these issues.
There are many techniques that can improve the performance of a web application.
Polling
Polling is a regularly timed series of client requests to the server in order to get information about some state change on the server side as soon as possible. The requests are made in regular intervals and a response is received regardless of whether there’s new information available. Polling might be a good solution if we can anticipate when new data will be available, however, real-time data is often not that predictable. We usually end up making a lot of unnecessary traffic, where there is little useful data exchanged.
Long Polling
Long polling (or a long-held HTTP request) is a technique where we keep a connection to the server alive without having data immediately sent back to the client. The server is basically stalling to send the response until it has something useful to say or until it reaches the end of a designated timeout. The client will receive the response when some useful data is available on the server, after which it will reopen the connection in order to repeat the same process. Long polling is also known as Comet or Reverse AJAX. This method offers better performance in terms of real-time data exchange but fails in high message frequency scenarios, where it behaves similarly to regular polling.
Server-Sent Events
Server-Sent Events (SSE) is another real-time data exchange method, where data flows from a web server to web browsers. By using SSE, we are able to push notifications in form of DOM events continuously, from the server to the client's browser. It is an event-based communication protocol where we open a persistent connection (stream) between the server and the client. The client can choose which events to subscribe to and get desired data updates. After the initial connection is made, the server will keep sending data to the client when new information is available, eliminating the need for continuous polling. The downside is that this communication only goes one way – from the server to client. Bear in mind that this could be a good tradeoff for many web applications and with its usage simplicity, SSE is worth considering at least.
WebSocket
WebSocket is a protocol that enables two-way persistent communication channels over TCP connections, and it is used to bring real-time experience to the web application. While WebSocket uses HTTP as the initial transport mechanism, the communication doesn't end after a response is received by the client. This also means that we are free from the constraints of the typical HTTP request-response cycle. Also, due to the fact that the connection stays open, the client and server can freely send messages asynchronously without polling for anything new. Given that there is no unnecessary request, the stress imposed to the server is significantly lower. We drastically reduce the amount of data exchanged as well as the lag in data transfer.
Ok, but how does this work?
It all starts with a handshake – an HTTP GET upgrade request. After a successful handshake, the server returns HTTP code 101 - switching protocols and the connection is promoted to a Websocket protocol.
WebSocket uses two URI schemes, ws and wss for unencrypted and encrypted traffic between the client and the server, respectively. This is analogous to http and https, and usage of the encrypted communication is recommended.
Beware of mixed content error: If you are using https for your website, make sure you are using encrypted WebSockets as well.
Websocket is designed to be a lightweight protocol, so not much functionality comes out of the box. Things like maintaining a connection, connected clients management and application level error handling are left to the developer to deal with. Using the native WebSockets implementation would be similar to using TCP directly to communicate. While it is certainly possible to do so, and perhaps the right choice for simpler applications, we have a lot of libraries and frameworks built on top of WebSockets to make our life easier as a developer. For .NET developers, I would certainly suggest taking a look at ASP.NET SignalR library, which covers this and much more. It simplifies the process of adding real-time functionality to web applications.
Browser compatibility
Given that WebSockets have been present for a number of years now, all popular browsers support them natively. There is no need for additional plug-ins to make things work.
Websocket to replace HTTP?
Ok, so if WebSocket is so great and has so many advantages, could we use it to replace HTTP (REST)?
WebSocket is great and many applications could benefit from using this communication protocol, but it is not a silver bullet. Other techniques could prove to be a better solution for the given task, depending on the application requirements. For example, HTTP is a protocol that is optimal for most web applications and is great for exposing static resources on the web, for the applications which make use of CRUD operations extensively, applications that could benefit from caching and so on... We can also use some of the polling techniques in some cases, because of their simplicity. It is best to consider the WebSocket protocol only as an enhancement to our web applications and to use it to provide real-time experience on the web.
Websocket pitfalls
The first thing to consider here is the Websocket scaling issue. When communicating using Websockets, chances are we are going to store at least some data about each connected client. We also leave connections open as long as it is required. This, in turn, makes this communication stateful and difficult to scale horizontally, so some additional software is needed to help us with this.
Another thing to reconsider is that not all proxy servers like the Websocket protocol. The issue here is that Websocket is a hop-by-hop protocol, and when a proxy server intercepts Upgrade header, it needs to send its own Upgrade request to a back-end server with proper headers. Also, the proxy server could choose to close a long-lived Websocket connection, as it would seem to be idle. Many proxy servers support these Websocket specifics nowadays, but it is a good idea to check if this is the case with the proxy we intend to use.
Conclusion
As software developers, we often find ourselves in situations where we need to choose one technology over the other, and this is not always an easy task. It is not just the matter of knowing how to use a certain technology, it is also important to know why and when to use it. This depends on the context and software requirements. As we learn more about all the pros and cons, it becomes easier to find the right tool for the job.