Case Study:

Writing scalable and robust server applications using Winsock and I/O Completion Ports on Windows 2000

Written by Oz Ben Eliezer

September, 2000


I've been working with Winsock for a while, and even had some experience with IOCP. I have read several books and several articles that examined various related issues - but none of which provided me with a complete basis for a real-life application. I decided to try to combine most of my findings into one article that will walk you through the creation of a scalable and robust, IOCP-enabled real-life echo server!


This article is targeted at people who have both Winsock 2, and multi-threaded programming experience. All the code in this article was successfully compiled and run using Visual C++ 6 SP4 and Windows 2000 Server.


The model presented in this article will only work on Windows 2000 Server, Advanced Server, and Data Center operating systems. It takes advantage of an API function that is not available prior to Windows 2000: BindIoCompletionCallback().

The code presented here, can be ported to earlier version of Windows (Windows NT 3.51+) with slight changes.


Use the advices and code that is given in this article at your own risk. This code has been tested by me, and as far as I know, is bug-free. Please let me know of any bugs/problems you find in this code. Please remember that I take no responsibility for any kind of damage caused to you as a direct or in-direct result of usage or mis-usage of any information or piece of code found in this article.


You may freely use the code presented here in your own applications, but I’d like to know about it when you do! It would be nice of you if you drop me an email when you use my code in your product.


Please do not republish this article or any part of it without my permission.


Now let’s get to business.



What In The World Are I/O Completion Ports (IOCP) ?


For a more complete description of what's IOCP, and what it can do for you and your threads, I would recommend you to consult Programming Server-Side Applications for Microsoft Windows 2000 by Jeffrey Richter and Jason D. Clark, chapter 2: "Device I/O and Interthread Communication". I am going to briefly discuss it in the context of Winsock and server application development.


IOCP is maybe the most difficult to understand way to work with sockets. If you have used Winsock before (and I assume that you have), then you probably have used WSAAsyncSelect() or WSAEventSelect(), to request Winsock to inform you of relevant sockets' events. You use WSAAsyncSocket() to request Winsock to post messages to your window procedure upon events. You use WSAEventSelect() to request Winsock to signal an event upon events. If you wanted to take advantage of the threading model that Windows offers you (and you certainly should, in order to scale on machines with several CPUs), you had to spawn and take care of threads on your own.


When you use IOCP, you spawn a pool of threads once - and they are used to handle the network I/O in your application. Technically, in Windows 2000, you don't even have to spawn the pool yourself - you can let Windows take care of the spawning and management of the threads in the pool, and this is exactly what I'll do in this article.


The Main Program:

1. Create and initialize 500 sockets.

2. Create a listener socket.

3. Call AcceptEx() for each socket.

4. Bind the listener socket with the appropriate callback function (ThreadFunction() )

5. Create an event object, call it My_Die_Event.

6. Wait on My_Die_Event.

7. Cleanup (details regarding this will come later).



(This function is called by Windows when an I/O operation has completed. Windows uses one of the threads from the thread-pool it created earlier to execute the function)


1. Check on which client the I/O operation completed.

2. Check which I/O operation completed.

3. Do some processing.

4. Issue another I/O operation (another read, or another write on the socket).

5. Return from the function. It will be called again, when another I/O operation completes.


So what do we have here?

We have threads that are all waiting for an I/O operation to complete. Those threads, as was specified before – are automatically created by Windows, when we first try to bind a socket to a callback function. Note that this automatic creation of threads is only available in Windows 2000 (and probably in future versions of Windows). If you intend to develop for Windows NT 4 or 3.51, you will have to spawn the threads yourself, and associate them, and the sockets, with a completion port. I will not show you how to do this in this article, but the changes are rather minor. Once an I/O operation completes, the operating system posts what's called "an I/O completion packet" to our completion port. Once the packet is sent, Windows resumes a thread from the pool, and has it run ThreadFunction(), while setting the appropriate parameters for the function. All this happens behind the scenes. As far as we are concerned – our callback function is automatically executed by some thread Windows assigns to it, when an I/O operation completes.


Let's take a look at the numbers we have set.


Q: How many threads are there in the pool that Windows creates?

A: This is up to Windows. In versions of Windows prior to 2000, you had to create the pool of threads yourself. The advantage to that, was that you had control of the actual number of threads in the pool. The disadvantage was that it was more complex, and less readable.


Q: Why do we create 500 sockets?

A: We are creating a set of sockets to be used throughout the program. Creating only 500 sockets will limits us to only 500 simultaneous connections - this could be ok, depends on the application. It would be a good idea to make this a configurable value. Creation and destruction of sockets is expensive. By creating a pool of sockets at the beginning of the program, we are enabling sockets-reuse and enhancing performance.




Getting Our Hands Dirty


Before we jump ahead and start writing the IOCP-manipulation code, I'd like to pause the IOCP discussion - and bring up another issue. If you are writing a server application, you have clients. When you work with clients, you have to mess with buffers. I am going to show you my general approach in these cases - this involves the creation of generic buffer class, and also what I call "packet class" (we'll get into it a bit later).


In order to receive data from a client, you call ReadFile(). In that call, you supply a buffer into which you want the data to be retrieved. This buffer must remain valid until the I/O operation completes. So far so good. You can set the buffer size to 1024 bytes, or whatever - and read data in portions of 1024 bytes. Let's take a look at various scenarios - what the buffer may look like when we receive a notification that a read operation was completed:


Scenario #1:

Only 321 bytes were received.

You interpret the data, figure out what the client wants, do some processing, send some data back to the client and call ReadFile() again, to receive more data.


Scenario #2:

Only 321 bytes were received.

You try to interpret the data, but you can't figure out what the client wants - the 321 bytes are not a complete command.


You need to call ReadFile() again to retrieve more data, but this will overwrite the first 321 bytes in the buffer!


Scenario #3:

1024 bytes were received.

You try to interpret the data, but you can't figure out what the client wants - the 1024 bytes are not a complete command. You need to call ReadFile() again to retrieve more data, but this will overwrite the contents of the buffer!


As you can see, we have no problem with scenario #1, but scenarios #2 and #3 are more problematic. Sometimes you will be able to act upon a client's request as it gets in, but you can't rely on that.


The case with the output buffer is a bit different. The buffer that you provide to WriteFile() must remain valid for the duration of the I/O operation. We would, however, like to be able to freely add data to be written out, regardless of a current state of an output I/O operation.


For the output buffer, I created an expandable buffer class. The sending operation logic is pretty simple, as you’ll see in the code later. Basically, whenever you try to write data to a client, the program attempts to send the data immediately. If it can’t – it stores the data in the expandable buffer, and the data is sent whenever the current I/O operation completes.


The case is a little bit different for the input buffer. It would be too much overhead to use such a buffer class for the input buffer. Instead, the code expands the input buffer automatically when required. The input buffer management is pretty interesting, you can examine the code as I’ll show it.


About thread-safety:

The expandable buffer class is thread-safe. I made it thread-safe using a critical section. In some applications, you don’t need to make the buffer class thread-safe, because calls to it are always serialized, because of the nature of the application. (If a situation where two threads attempt to write data to the same client simultaneously cannot occur, then it’s safe to remove the thread-safety mechanisms). In order to remove the thread-safety mechanisms, you can simply inherit from the class, and override the relevant member functions. (InitializeInUse(), EnterInUse(), LeaveInUse() and DeleteInUse() ).


Look at buffer.h to see the buffer class code.


When sending data to clients, we provide WriteFile() with a buffer to be sent. This buffer must remain valid until the I/O operation is completed. I implement this by holding 2 buffers. The first – is the one that is passed to WriteFile(). The second – accumulates data, and is copied to the first whenever a send is completed.


Before showing you some more code, I would like to discuss another issue. The server application that you are writing, probably needs to receive commands from the clients, and respond with commands. If you are designing your own protocol (and not implementing some standard-protocol server, such as an FTP server or a web server), you have the freedom to decide the actual format of the data transferred between the server and the client.


I like to base everything on what I call a “packet infra-structure”. The client posts requests by sending a complete packet, and the server responds by sending a complete packet. You can define a packet in whichever way you want. In this article, I have implemented what I consider the most generic packet type there can be.


A packet, in this article, is a structure that consists of an integer, and binary data. When the client posts a request, it first sends 4 bytes, describing the length of the requests, and then the request itself. This makes it very easy for the server to know when a request has fully arrived from the client (all it needs to do, is check the length of the request, and see that it has received enough data). The server responds in much the same way.


Internally, I created a tagPacket class, that holds two integers and a buffer. The second integer, holds the current size of the buffer. This could always be identical to the other integer, which holds the length of the data in the buffer – depends on the way you implement the application. If you create a new packet instance for each packet received, you can easily do with only one integer, describing the length of the data, and having the buffer always in the same size of the data. If, however, you decide not to allocate and de-allocate a packet for every client request, you may do so by separating the size of the buffer, and the length of the data. Whenever a new packet is received, the size of the buffer is examined. If the buffer is found large enough to contain the new data, the data is copied to the buffer, and its length is stored in the other integer.


Look at general.h to check out the packets manipulation code.


I believe that this code is pretty straight-forward. As you can see I have another function in this file – the function used to log errors. You will probably want this function to do something else, probably log errors to the system’s Event Log. In buffer.h you will also find a function to retrieve a packet from the buffer.


As you can see, it is very easy to see whether a packet has arrived or not. Note that this approach may lead to some problems. For example – what if as a result from a bug, or a hacking attempt, the first 4 bytes indicate a value of 2 giga? In such a case, the abuser can keep sending data, and consume a lot of server resources. We will handle such cases later, by limiting a request’s size and terminating abusing connections.


It is time to talk about the client’s class. The approach I have taken while designing the client class and manipulation mechanisms, was of reuse. I prefer allocating memory once, and then reusing it, instead of allocating memory for each new connection and de-allocating it when the connection terminates. Allocation and de-allocation of memory are expensive operations, and should be avoided – in my opinion – in most cases – even at the expense of extensive resources consumption.


A few explanations regarding client.h and client.cpp.


Whenever we perform an I/O operation, using ReadFile() or WriteFile(), we should pass an overlapped structure as one of the parameters. We are actually passing an extended OVERLAPPED structure (a structure derived from the OVERLAPPED structure). The structure we are passing, contains some context information. The context information consists of the memory address of the client’s class’ instance that requested the I/O operation, and the type of the operation requested (read or write). This information is required in the callback function.


When we call ReadFile() to receive data, we pass it end_in_buf_pos, and not actual_in_buf. We also start reading data from start_in_buf_pos, and not actual_in_buf. Basically, those manipulations are done to avoid unnecessary expansion of actual_in_buf, and unnecessary calls to memmove. Look at CClient::Read(..) to see how it is done.


Whenever a ReadFile(..) operation completes, a timestamp is recorded. Some other section in the class uses this data to ensure that inactive clients (possibly abusers) are disconnected from the server. The function that is responsible for that is CClient::Maintenance() and it will be discussed in a short while.


CClient is an abstract class, which means that you must derive your own class from it. In your own class, you have to override three functions – these functions will be explained now.


int CClient::ProcessPacket(tagPacket *p)

Whenever a complete packet is received, this function is called with the new packet’s address in p. In the code that I will present, this packet is a member of CClient. I used only one packet per client throughout the life of the application, to avoid constant allocations and de-allocations of tagPacket. This function is responsible for processing data received from the client – you may do there whatever you like – including sending data back to the client, using Write(..). The value that is returned by the function, tells the server application if it needs to do something. I have defined three possible values. CMD_DO_NOTHING – no action is required. CMD_DISCONNECT – client must be disconnected (possibly an abuser that sent a bogus packet). CMD_SHUTDOWN – the server should now shutdown. Those command values are declared in commands.h which will be shown a bit later.


void CClient::CreateInvalidPacket(tagPacket *p)

It could happen that a client is found to be an abuser during CClient::Read(..). For example, when a client attempts to send too much data in one packet. In such a case, instead of providing a true packet, CClient::Read(..) will call CClient::CreateInvalidPacket(..) which is responsible for creating a packet that will be recognized by CClient::ProcessPacket(..) as a special-purpose invalid packet, so it can take appropriate action (probably disconnect the client).


void CClient::Maintenance()

This function should occasionally be called for each client. Its purpose is to ensure that no client is abusing the system. Currently, it performs two different checks, as you can see in its code. It is called from the main thread.


Look at client.h and client.cpp to see the client class code.

Look at client_0.h and client_0.cpp to see the code I wrote for those three functions.




Putting It All Together (or – Enter: IOCP)


Now let’s tie the pieces together, with IOCP. Folks, this is what you’ve been waiting for. First of all, we will create the function that is called when an I/O operation is completed. The function’s declaration is actually dictated to us by Windows.


Look at callback.h to see this function’s declaration.


Now comes the body of this function. It’s really not too complicated. Note that it contains the line extern HANDLE dieEvent. It signals the dieEvent, on which the main thread is waiting, when it’s time to shutdown.


Look at callback.cpp to see this function’s definition.


This function counts the packets that it receives, and prints some stuff to the screen. Eventually you will probably want to change that.

Now comes the code that starts things up. It says everything we said in the beginning – initiates clients, sockets and waits on dieEvent. One interesting point, is the way maintenance is done. Every CHECK_CYCLE seconds (which I set to 10), it resumes (it waits on dieEvent up to CHECK_CYCLE seconds), and executes the CClient::Maintenance() function on every client in the system. This function makes sure that the client is not abusing the system. One way to abuse the system, is to connect to it and not send any data – thus not allowing AcceptEx to accept the connection. You will probably want to tweak the values there to suit your own needs. You also may want to add other types of maintenance to the CClient::Maintenance() function.


That’s about it. It’s not a real echo server, in the sense that it’s not actually repeating exactly what it receives. The application expects packets and returns the very same packets that it receives – however, it will not respond correctly to clear text. Testing such servers with telnet is pretty much impossible. That’s why I’ve created a small Visual Basic utility, which connects to the server, and allows you to send any number of packets.


Note that there seems to be some kind of bug in the code that retrieves packets and shows their content. I haven’t looked for it too much – it only happens when you receive massive amount of data at once.


I hope that this article and sample code will be useful for you. Feel free to email me for questions / remarks / whatever :-)

Download the server source code
Download the Visual Basic client source code


Oz Ben Eliezer,

Embedded Software Consulting LTD.

Haifa, Israel

September,  ž2000