Code reviews

It has been a long time since my last post, but I finally had a chance to sit down and write. I have been thinking about this topic for a while and I decided that the best was to put my thoughts in a blog post, so here it goes.

I have been doing code reviews for as long as I have been writing code, and that is a long time since I started when I was about 10.

The first times I did code review it was not a formal code review in the normal sense. At that time I used to get a magazine once a month that will contain the listing of a program, usually a game, that you had to type by hand and at the end you would hopefully get a new game to play. Unfortunately most of the time the code did not work, so I had to start looking at the code to try to find what was wrong with it.

Fast forward a few years and I started reviewing code from colleagues or from other people collaborating on open source projects.

At first my code reviews were about “finding flaws” in the code to try to shut down the code. I think this is a very classic starting point for code reviews. One starts by finding flaws and exposing them.

Over time I have improved, at least I hope so, my reviewing skills and develop my own way of performing code reviews. I have also realized that just focusing on the code at hand is a very poor way of doing a code review, so at some point I started adding dimensions to my code reviews and now I review code in four different dimensions, two of them intrinsic and two of them extrinsic to the code:

  • Delivery
  • Structure
  • Functionality
  • Critique

I will explain each one of them, but it is worth performing a review in that order. Do not jump directly into code critique, take the time reviewing the other dimensions before doing code critique.

Before jumping into how to perform a code review, it is worth to separate code reviews into internal and external. Internal code reviews are reviews of code over which you have a certain degree of control. For example code that is written by a colleague in your team or from a collaborator in an open source project. External code reviews are reviews of code over which you do not have any or very little control, for example colleagues from a different part of your organization or sudden contributions from somebody to an open source project. There is not much difference in the way the review is performed, only a small details that I will highlight when needed.

Delivery

This is probably the most overlooked side when reviewing code, but in order to review code it has to be delivered to you somehow.

This usually happens in one of three ways: a pull request (or equivalent in your VC), a patch file (or a collection of them), or a package (such a zip file or a tarball).

Start by looking at how the code is delivered and take your time. Usually when people ignore this, they get upset really easy if things do not go the way they expected, so instead take the time to find out how it should really go.

As a rule of thumb I always start by looking at the contents of the package or by looking at the files included in the PR or patches. Is there a file with instructions (a README, BUILD or similar)? If so, take the time to read it and find out what you should do. This file should also document the requirements to build the software, for example which version of the compiler and potentially what other software you should have in your build environment.

If there are no clear instructions, that is usually a bad sign. Not only because there are no clear instructions, so you do not know if the build fails because you did something wrong or because there is a problem with the code, but because not including instructions usually leads to base your work on assumptions that might not be true.

Here there is a difference between an internal and external code review. An internal code review should follow the standards and conventions, so instructions are less relevant because the build process is already documented. But if a new component or requirement is added, then the corresponding build process should be documented.

Structure

This dimension refers to how the code is structured. This means, does it follow the conventions of the language? For example, C projects usually have a folder called include where all header files are stored.

In Java the code is divided in packages, with the idea that code that has similar functionality should be closer. There are many ways to divide the code into packages, so I will not go there but there should be consistency in the way it is done. For example, if there is a package called exceptions, are all the exceptions there or are they spread all over the code too?

Another aspect of the structure is the logical division of the code. Are things grouped by their functionality or are they just grouped randomly?

Functionality

Usually people think about this dimension, but only after criticizing the code. This has the consequence of overlooking aspects that might not be obvious.

The first thing to look for is a description of what is the functionality that is accomplished by this code. This ties directly with the delivery of the code, is there a link to a task in a task manager? Is there a design document? Without those, it is very difficult to judge the functionality of the code.

When looking at the functionality it is a good idea to look at the code as a black box. Look at the unit tests. No unit tests? That is a red flag, no unit tests means that there is no easy way to check the internals.

Critique

This is the fourth dimension and should only be performed after you are finished with the other three. Most people jump straight into code critique and miss a lot of information.

Criticizing code is not easy and it could be argued that is highly subjective. Therefore it is necessary to have seen at the other aspects of the code before doing it.

When reviewing the code, look first for consistency. Does the new code look like code somewhere else in the project? If it does not, then ask why. Is there a valid reason for introducing a new style?

Does the code use the right methods, classes and functions? For example, if searching for text, does it use regular expressions or does it use a brute force approach?

My advice when criticizing code is to focus on the big picture. Does this code fit nicely into the project or does it feel like it is being forced into the project? This is more subtle than what it looks like, if the project is making use advanced constructs of the language, for example lambda functions, but this code is not then it is worth asking why not.

Another aspect worth to keep in mind when criticizing code is to consider the logical flow of the new code. Again this is trickier than what it seems since you have to be able to look at the code that will call this code and follow the logic through the whole call path. Sometime ago, in a previous team, somebody suggested a change that would introduce state into a component that did not have state before. The problem was that the state would only be introduced on certain occasions, which would lead to a situation in which some threads would have state and some other would not. After a discussion, it was decided to drop the change.

Final words

Before I finish, I would like to say that code review is a great opportunity to learn new things. Approach code reviews with an open mind, not with the idea of shutting down code.

The objective of a code review is to make the code better than what it was. Sometimes the code is already at the level it requires and there is not much to say, but usually there is some detail here or there.

Sometimes you might find yourself in a position where you do not understand the code that you are reviewing. Do not be afraid to ask about what is happening in the code, if you do not understand the code now, chances are you will not understand it later when it is running in production.

And as a last point, a code review is the place to ask questions. Even if there is a task that says that something needs to be done, if you feel that you need to ask about the task or about the implications of the change, do not be afraid and ask politely. The person that wrote the code, might not understand it either, so by asking the question you might lead to a much better and improved code or to drop a piece of code that might lead to problems later.

There are no sacred technologies

In my previous post I talked about how there are no silver bullets, that is there are no technologies that will solve all your problems without introducing some of their own. In this post I will talk about something that happens often, as development processes mature in an organization, somehow technologies become sacred.

Let’s start with a fictional story. Joe is a developer and entrepreneur. One day he has the most brilliant idea for a business and he starts working on it. He uses the technology he knows and sticks to it. As usual, the first version of the product is a patch work of technologies but Joe gets his idea to the market and finds a receptive audience. His product is becoming a success. This attracts investors and Joe starts scaling both the product and the organization.

Joe’s product is a simple application to be installed in a desktop computer. That is the state of the art, the internet is still in its infancy and the *aaS world is in the process of becoming something.

This product keeps growing, with the investment Joe is able to hire a development team, a marketing team and even sales and support people! They are approached by a vendor that suggests that they use their technology since they can provide everything they need. They standardize on the technology of that vendor and life is good. They have a reliable tech stack, they get invited to conferences, they get access to early releases and they are invited to discuss the roadmap of the vendor. Joe’s company is now a known name in the market and they have loyal customers and a vibrant community of users.

Their vendor grows too and starts building more and more products. This allows Joe to reach new customers and bring better solutions to them. They invest more and more in this technology. There are times when the vendor makes changes to products that Joe’s company relies on that affects them, but they are always there to help them. They get preferential deals, VIP support and of course more invitations to conferences. Slowly Joe’s company shifts from being an innovative company bringing solutions to the problems of their customers to being a company that is invested in using a given technology stack. Nobody realizes this because it has happened so slowly, their products are still a success and their customers are happier than ever.

One day there is a big paradigm shift. Desktop applications are no longer the thing, and everybody moves to mobile applications and cloud services. The technologies used to build Joe’s products never considered this case, so they are taken by surprise. They are approached by the vendor and told not to worry, this is just a simple seasonal thing that is going to go away very soon. Everybody is going to come back to their desktop apps as soon as they realize that mobile phones and cloud services are not as reliable as desktop applications and enterprise applications running in your own datacenter.

At this point in the story, it is good to take a break and consider the possibilities. Nobody knows what is going to happen, is it true that mobile applications are just a passing thing or are they here to stay? What about cloud services? Will people really adopt them or will they come back to the safety of a datacenter? Their vendor is reassuring in saying that they should not worry because they are also going to start developing a cloud offering just in case. And regarding mobile apps, they promise a new framework to help them there too.

As you might imagine, the story does not end very well for this company. Despite best efforts from their vendor, their frameworks never became a serious contender in the cloud arena nor in the mobile space. People and companies moved to cloud services and never looked back to desktop applications.

This happens more often than not. And there are multiple reasons. First of all, if you have a vibrant business with a proven business model, it is hard to make changes. Should we really risk everything? We are doing fine and we will still be fine in the future.

Second, you marry your technology. Technology is just a tool. Vendors are just vendors, no matter what they offer. Your business is not using somebody’s technology, your business is to anticipate what your customers will need and provide them with solutions, not to find a way to force that solution by using the technology you have.

This does not mean that you should throw your technology stack out of the window. This means that it is wise to experiment and try other technologies, specially if there is some kind of de-facto standard in the industry.

Don’t be afraid of trying new technologies to solve problems. If somebody in your organization says that you cannot do that because that is not the approved technology stack, then find a way to propose a change. Any organization should have a defined way to try and investigate new technologies and market trends. If your organization does not have one, make sure that you raise that concern an that a process is implemented for that. If your organization is unwilling to try new things, then it is time to move on.

There are no silver bullets!

It has been a long time since I posted something, but these are strange times. With the pandemic winding down and some light at the end of the tunnel, I finally have time to write some lines again.

I thought about writing a long post, but then I decided to divide this post into two so this is the first part. This part is about the promise of the one technology/product to rule them all. The next post will be about what happens when technology is not let go.

In horror stories there was always the silver bullet. The one thing that could be use to kill the monster and save the day. No matter what, a silver bullet would do. And it is not strange that people believe that there are silver bullets, after all, isn’t tempting to think that there is something that can be used that will solve all of our problems and will have no side effects?

When it comes to technology, this is mentioned again and again and ignored just as quickly. The great (and late) Fred Brooks was warning us already in 1987 about thinking that software projects could be saved by a silver bullet, but we keep putting his advice to the side.

To make a long story short, a silver bullet is a new or unknown technology that promises to solve all of our problems without introducing new ones.

If you have ever read or learned about thermodynamics, you quickly realize that a silver bullet is a physical impossibility. The second law of thermodynamics says that the entropy in a system can only grow. This means, no matter what your system does, there will be more entropy at the end that there was at the beginning.

If you are not familiar with this, you can think it like this. If you build something, then you need materials. And even if you are the best in your trade, there will be residuals after you are done. There will always be a bit of wood, a few nails, some glue and other stuff that are leftovers.

In programming terms the leftovers is called technical debt. Technical debt is something, either code or design that needs to be improved to make the system right. This does not mean that the system is not working. The system can be fully functional, just there are some bits that are not shinning as they should.

And this is when somebody says that everything would be better if we just used technology X. Technology X does not suffer from those problems, and therefore our system would be totally shinning if we had used it.

And as always, there is some truth about what you hear. Technology X might not suffer from the problems we are suffering right now, so our system would be better with technology X. However the same technology suffers from other problems, which might be unknown.

Think about this, the major headache for any C/C++ programmer is null and dangling pointers. Why do I have to manage pointers? If there only was a way to automatically manage memory so I do not have to think about it. And there you have it, we have automatic memory management (explained in one of my previous posts).

But the price of automatic memory management is among others, giving away the control of how memory is managed (obviously). This means you have little control over when things are disposed and recycled. Not only that, automatic memory management requires some features from the programming language and runtime to work correctly, so you also have to trade in some features.

Ok, memory management is an important but usually complicated topic, so let’s chose another topic that is probably better suited for this. What about database access? After all, data structures are intrinsic to any programming language, and it should not be problematic for a programming framework to be able to serialize and deserialize things. I don’t even want to try to enumerate the many types of technologies that are found in this area, each one with their own pros and cons. And we are still getting new technologies every other week.

We have gone from SQL databases to NoSQL databases. Key-values stores to document databases. From row databases to column databases. From persistent databases to in-memory databases. From strict data hierarchies to graph hierarchies where everybody is a friend of everybody. And the list goes on and on.

And what about deployment? There was a time when it was accepted that a tar file plus a Makefile was the way to go. Then along came the GNU autotools. And then RPM and DEB. On the Windows side there was the zip file, the custom installer, the WMI and so forth. And then came the cloud and microservices, and containers and we are still trying to find the one way to do it right.

Does that mean that technology has not advanced in the last 30 years? Of course not, technology has had giant leaps forward. But we still do not accept the second law of thermodynamics, which says that no matter what, we are still going to have some residuals at the end.

Typical use cases for Kafka

In my previous post I explained the main components of a Kafka system. In this post I will explain some of the typical use cases where Kafka is used.

By typical I do not mean that all systems built on top of Kafka follow these patterns, only that these use cases are wildly used.

Before we start

If you are not familiar with Apache Kafka, please take a look at my previous post Data streaming with Apache Kafka where go over the main concepts you need to know.

Typical use cases for Kafka

Kafka is a message brokering system. This means it is a system where messages are published and consumed afterwards.

The most common use case for Kafka is to isolate producers of data from the consumers of data. Producers of data will write records to a topic in Kafka and the consumers will read from that topic.

Another very common use case for Kafka is signaling. A system that needs to notify other systems of state changes will post messages to a Kafka topic. Systems that are interested in those state changes will subscribe to that topic and react accordingly.

The producer – consumer case

Decoupling producers of data from the consumers is a typical situation in many types of systems. By itself Kafka is very good at keeping the producers and consumers working at their own pace. If we incorporate a schema registry and a schema language such as Avro, we can provide a separation in the logic between the producer and consumer. The consumer and producer use the schema that is available in the schema registry.

In this scenario it is very important to select a partitioning scheme that maximizes the parallelism. The unit for parallelism in Kafka is the partition, so the number of partitions in a topic should be a function of the maximum number of consumers. In any case, it is much better to start with a lower number of partitions and increase it afterwards than to start with a very large number of partitions. The reason for this is that it is not possible to decrease the number of partitions in a topic, you can only increase it. If you need to decrease it, you need to create a new topic.

In addition to the number of partitions, it is important to select a partition key that distributes records in a uniform way. This means, that avoids that all records end up in one partition. The default partitioning algorithm computes a hash of the key and then applies modulo on the number of partitions. If the key does not vary or there are very few keys, it might be necessary to replace the default partitioner by a custom one.

Signaling

Any system that needs to notify other systems of a state change has many alternatives. One of the preferred ways to do this is by using the publish-subscribe pattern. In this pattern, a system will publish events and any other system that is interested in the changes will subscribe to the notifications.

Kafka is ideal for this kind of situations. The producer translates very well to the publisher role in this pattern, and the consumers behave exactly as the subscribers.

In this scenario it is better to have a small partition count. Each system that needs to be notified can subscribe to a topic using its own id. This case is different from the producer-consumer in the sense that the consumers are not trying to maximize the throughput, they need to be able to react to system changes at their own pace.

What’s next?

In my next post I will change gears again and I will start a series of post about different types of performance testing.

Data streaming with Apache Kafka

Data streaming is becoming more and more important, and hardly a new project is started without having at least considered data streaming as a solution or part of the solution.

Apache Kafka is a messaging system that has become extremely popular and is the top choice for data streaming nowadays.

This post goes into what is Apache Kafka and why it is such a good match for data streaming.

The origins

Apache Kafka was created by Jay Kreps and Jun Rao while working at LinkedIn. Kafka was designed to be the nervous system of complex systems.

The idea was to develop a systems where events could flow between components without components knowing about each other.

The internals

Kafka is modeled based on independent but collaborative units called brokers. Each broker is independent of each other, but they collaborate in order to ensure high performance and reliability.

In the current version of Kafka there are some additional units running a different type of software called Zookeeper. Zookeeper is a distributed key value store used to keep the configuration of the system. In the next version of Kafka Zookeeper will be replaced by normal Kafka brokers.

Data in Kafka is stored in what is called topics. A topic is nothing more than a log of immutable events. This means that a topic is a sequence of events, and that each event cannot be changed once it is added to the log.

A topic can be partitioned, which means that the load of the topic can be shared by several brokers.

A topic can also be replicated, which means that each partition on a topic will be maintained by more than one broker.

Producing data

To produce data, a producer needs to find out the topology of the cluster and then decide where to send the data.

The topology of the cluster is the distribution of topic-partitions across brokers. Each topic-partition has a leader, which means a broker that accepts the writes from producers and then communicates to the followers that a new record was added.

The producer uses the key of the record to decide to which partition of the topic the record should be written to. There is a default partitioning algorithm that computes a hash of the key and then applies the modulo over the number of partitions. If a record has no key, then round robin is used over the number of partitions.

The producer can chose between three levels of consistency:

  1. Write a record and do not wait for the followers to reply.
  2. Write a record and wait for one follower to reply.
  3. Write a record and wait for all followers to reply.

Consuming data

What makes Kafka special is that data is not consumed as it would be in a normal queue. Data is read by a consumer, but the act of reading the data does not remove the data from the log as opposed to the act of dequeuing data from a queue.

Consumers form groups called consumer groups and divide the partitions of a topic among themselves. If there are less consumers than partitions then some consumers will get more than one partition assigned. If there are more consumers than partitions, then some consumers will not be assigned a partition.

Events on Kafka have an expiration, and they can be read by consumers as long as they have not expired. Once a record expires, it is not possible to read the record anymore.

A typical system

Typically Kafka is used to separate the producer applications from the consumer application. Producers can focus on producing data as fast as they can, while consumer applications can process the information at their own pace.

For example, a system that receives log entries via a REST API can produce events directly to a topic or several topics in Kafka. The analysis of those log entries can be performed by consumer applications without affecting the producers.

Cool, so what’s the catch?

As with any technology, there are things that can be problematic or at least not optimal depending on the use case.

The first problem is finding a given record. Kafka is designed so records can be read sequentially not for finding records given a key. The usual solution to this problem is to use a processing library called KafkaStreams, also part of the Kafka project, which offers higher abstractions such as tables that provide a key-value interface.

Another typical problem is how to consume and produce data between several Kafka clusters or even other systems. There are projects that allow you to replicate all or only a few selected topics from one cluster to another. The main issue with this kind of setup is the fact that Kafka is meant to be kept in a controlled environment and not out in the internet. It is possible to do this, it just requires more planning.

Long term storage is complicated but not impossible. Kafka is designed for relatively short lived records, not for records that have to last weeks, months or even years. The usual solution is to offload the records to another storage solution, such as cloud storage and consume them again if needed.

What’s next?

In my next post I will describe a typical Kafka system including producers, consumers and processors.

What you need to know about interrupts

If you are from my generation or older, you remember the wonderful days of manually configuring hardware by using jumpers and then connecting the hardware to the ISA bus on a computer.

Before I got my first PC, I owned a more user friendly system(an Atari 65XE), so new hardware was magically configured. Well the few things that you could add to that setup, the options were very limited at that time.

Nowadays it is seldom that one buys a piece of hardware that needs to be physically installed in the motherboard. Most hardware uses the USB bus or another similar technology.

What is an interrupt and why do I care?

Let’s think about a simple thing like typing on your keyboard. Every time you press a key the keyboard will produce three events:

  • a key pressed event
  • the key
  • a key release event

How does the CPU find out about these events? One simple answer is to have a poll loop, in which the CPU waits for some time and checks the data lines from the keyboard. If no data has arrived, then the CPU waits and checks again.

It is easy to see that this mode of interaction is not very effective. It might work in very simple systems where the CPU has literally nothing better to do than wait for that input, but it does not scale to a fully multitasking OS.

In order to solve this issue, a CPU has special input lines that are either high or low. When a device wants to communicate with the CPU, then the device raises its input line, which causes the CPU to interrupt its normal flow and call a special routine to take care of the device. It is worth mentioning that a CPU has only a few external interrupt lines, most interrupts come from devices inside the CPU.

Cool, tell me more!

During the bootstrap of a CPU, we need to configure the different control registers, for example the ones that configure the address, data and chip selects for the different devices, and at the same time we need to create what is called an interrupt table.

An interrupt table is nothing fancier than an array of addresses of routines that need to be called when the corresponding interrupt is raised.

In my previous post I talked about the hardware of a serial port and we will continue with that example in this post. As explained a serial port is a device that can transfer bytes serially bit by bit. Typically a serial port will raise an interrupt when new data has arrived and is ready to be collected or when data has been sent and the device is ready to send a new data.

Configuring interrupts

Configuring interrupts in many CPUs is a privileged operation that can only be performed during bootstrap. Some CPUs will allow to remap interrupts after bootstrap, but usually only some of them.

A typical technique used by an OS is to install a default interrupt handler for all interrupts and then have its own internal table to call the corresponding handler. This way, the handler can be configured at run time without any restrictions.

To complicate things, devices might share an interrupt. Think of devices that are attached to a data bus which transmit data to the CPU through the bus. The bus usually has one interrupt that is raised when the bus needs to communicate with the CPU. In this case the bus will raise the interrupt and the CPU will read the source of the interrupt from some special register in the bus master.

Top and bottom halves

From the point of view of the CPU, an interrupt is very expensive. The CPU has to stop processing what it was processing and call the corresponding handler. The handler will run in a special mode in which other interrupts are not allowed (usually with some exceptions for what is called the unmaskable interrupts).

Because of this, interrupt handlers are divided into two parts: the top half and the bottom half.

The top half is the code that gets called when the interrupt is raised. This code needs to be as quick as possible and have no side effects. The aim of this code is to deal with whatever is urgent and then schedule the lower half to take care of the rest.

In the case of our serial port, the top half will first check the status register from the UART and based on that it will for example copy the received byte to memory. This way the device is ready to keep receiving data. The memory should be allocated before hand, the interrupt handler cannot allocate memory.

The lower half of the interrupt handler might be scheduled long after the top half was run. The aim of the lower half is to take care of any work that is not urgent. For example in the case of the serial port, the lower half can go through the data that has been received and check which process is waiting for data. It copies the data to the process space and then awakes the process.

What’s next?

In my next post I will change gears again and I will talk about data stream processing using Kafka.

User to hardware (part 2)

Disclaimer This post is not meant to be a precise description of how a device driver works or an accurate description of the hardware components and their interactions. This post is an introductory post for people that would like to know more about device drivers and hardware programming. Therefore some simplifications have been made.

In my previous post I talked about how a user program connected to the serial port to transfer and receive data. In this post I will cover the kernel and driver side of things, how does the hardware work and how is the interaction from the hardware to a user program.

A brief introduction to a serial port

A serial port is a type of device used to send data serially from A to B. This means that data is sent one bit at a time instead of sending larger units such as bytes parallel.

A serial port is usually composed of two parts: the logical component called UART/USART and the physical component that transforms the logic levels into the transmission levels.

A UART is a simple device and it can be represented by the following diagram:

To transmit data the data needs to be placed in the Transmit register. The UART will copy this register to a shift register and begin sending the bits one by one until all bits are sent.

The reverse process is used to receive data, the data arrives to a shift register and once all the bits have arrived the UART copies the data to the Receive register.

A device driver

To avoid getting into details of an existing API such as the Linux device driver API, I will create my own device layer inspired by a Unix system. This layer will be in C++ to make sure we focus on the concepts and not on the differences between this API and other APIs.

I will follow the Unix tradition of dividing devices into character and block devices. The serial port is a character device, therefore I will create a simple API for a character device.

class Device {
public:
  enum ConfigurationCommands {...};
  virtual int load() = 0;
  virtual int unload() = 0;
}

class CharacterDevice : public Device {
protected:
  int d_references;
public:
  virtual int hold(const char *path) { ++references; }
  virtual int release(int fd) { --references; }
  virtual int configure(int fd, ConfigurationCommands command, void *user_data) = 0;
  virtual int get(int fd, void *to_user, int length) = 0;
  virtual int put(int fd, void *from_user, int length) = 0;
}

Before going to the code for a serial device I will explain some of the methods and the architecture. The base class is called Device and defines only two methods load and unload. These methods need to be implemented by subclasses and are meant to be used when the driver is loaded or unloaded.

The goal of the load method is to check that the driver is ready to be used by checking that the hardware is present and allocating any data structures that might be required. If a driver cannot be loaded because the hardware was not found, then it should return an error code. It is important to understand than checking for the presence of the hardware does not mean that the hardware will be correctly initialized and ready to be used, only that the hardware is ready to be configured and initialized.

The unload method on the hand, is the method that is called when the driver is removed. The driver can only be removed when nobody is using it and the unload method will make sure that the hardware is stopped and in a safe state.

The next methods are defined in the subclass CharacterDevice. This class defines five methods: hold, release, configure, get and put. The hold method is the first method to be called and its goal it is to connect the calling process with the hardware. If the call is successful, then the kernel will allocate a file descriptor and the communication with the driver will be through the file descriptor. In this API this method increments a reference count so we know how many are using this device.

The method release is called after we are done using the device. In this implementation this method decrements the reference count. It should be mentioned that not all devices can be shared by more than one process, many devices allow only one process and do not let others call the hold method until the release method has been called.

The configure method takes a configuration command and some data from user space and applies the corresponding configuration to the device.

The get method is used to extract data from the device and pass it to the calling process.

Finally the put method is used to send data from the calling process to the device. We will now implement a serial device using a simple UART and the API we have defined.

struct UART {
  unsigned int d_speed;
  unsigned int d_flags;
  volatile unsigned char *d_transmit_register;  // This register only accepts writes and blocks until the data is sent
  volatile unsigned char *d_receive_register;  // This register only accepts reads and blocks until read

  UART(unsigned char *transmit_register_address, unsigned char *receive_register_address) {
    d_transmit_register = transmit_register_address;
    d_receive_register = receive_register_address;
  }
}

class SerialPort : public CharacterDevice {
protected:
  UART *d_uart;
  CircularBuffer *d_buffer;
  int d_buffer_pointer;
public:
  SerialPort(int buffer_size) {
    d_buffer = new CircularBuffer(buffer_size);
    d_buffer_pointer = 0;
  }
  void manage_interruption() {
    // Simple implementation, move data to buffer
    d_buffer->push_last(d_uart->receive);
  }
  int load() {
    d_uart = new UART(...the addresses of the registers...);
    register_interrupt_handler(...interrupt number..., manage_interrupt);
    return 0;
  }
  int unload() {
    if (d_references != 0) return -1;
    delete d_uart;
    return 0;
  }
  int hold(const char *path) {
    if (d_references > 0) return -1;
    CharacterDevice.hold(path);
  }
  int configure(int fd, ConfigurationCommands command, void *user_data) {
    // This method is too long to write it here
  }
  int get(int fd, void *to_user, int length) {
    // This is a naive implementation, we do not block waiting for more data
    // If we do not have more data, our buffer will return 0s.
    unsigned char *data = d_buffer.get(d_buffer_pointer, length);
    copy_to_user(data, to_user, length);
    return length;
  }
  int put(int fd, void *from_user, int length) {
    // Another naive implementation
    unsigned char *data = (unsigned char *)copy_from_user(from_user, length);
    for (int i = 0; i < length; i++) d_uart->transmit = data[i];
    return length;
  }

After the device is loaded, the address of the UART registers is configured and the interrupt handler is configured. The serial port will not be able to do anything since no speed nor transmission mode has been configured at this point.

When the hold method is called it checks for the number of references, and returns an error if the device is already in use. The device will be configured by calling the configure method (remember how in my previous post we used some wrapper functions to avoid calling ioctl directly).

When data arrives to the device, the interrupt handler is called and the handler only copies the data from the receive register to the buffer. Notice how we do not account for buffer overruns, nor any other type of error checking. Another fact to notice is the fact that we need to call a special function to pass the data to the user space, the driver is working in kernel space and the memory address of user space look completely different.

To send data we have implemented a very simple algorithm, we copy each byte to be sent to the transmit register. We assume that this operation does not return until the data is copied.

What’s next?

The objective of this post was to explain the different components and not to go into details about the internals. In my next post I will talk about interrupt handlers and how do they are implemented, and of course how they relate to running processes. Stay tunned!

User to hardware (part 1)

This is the first post in a two post series covering the trip from a running process all the way down to the hardware and back.

In this post I will cover the software side, this means I will start from a running process and walk through the different pieces that are needed to talk to a hardware device and then the way back to the running process.

The examples in this post are taken from https://tldp.org/HOWTO/Serial-Programming-HOWTO. This means we will cover a program that communicates using the serial port to exchange information with another system.

Before we start

In Linux, as in any other Unix system, devices are represented as files. And files are simply a sequence of bytes. Therefore a device in Linux is represented as a sequence of bytes.

We need to stop for a small break here because what I just wrote in the previous paragraph is not entirely true. It is true that files in Unix are sequences of bytes and that devices are represented as files, however there are two main categories of devices in Unix: character devices and block devices. A character device is a device that can be read and written byte by byte. A block device is, in simple terms, a device that can only be read and written by blocks of bytes. A serial port in Unix falls into the character device category and can thus be read and written byte by byte.

This abstraction greatly simplifies the interaction with devices because for the most part they behave the same as a regular file, with the exception that they usually cannot be rewinded or forwarded.

Accessing the serial port

The first step is unsurprisingly opening the device file and getting a file descriptor. Once we have obtained a file descriptor, we can proceed to configure the serial port according to our needs.

The configuration of a device is performed through a system call: ioctl. Fortunately the serial port is (or was actually), such a common device that the C Library includes a few wrapper functions to wrap the system call and the different parameters.

The following code is an excerpt for the configuration of the serial port to be used at 115200 bps, with hardware flow control, canonical input (newlines signal end of line), 8 bits without parity and 1 stop bit and read enabled:

int fd = -1;
struct termios newtio;
fd = open("/dev/ttyS0", O_RDWR | O_NOCTTY ); 
if (fd <0) 
  exit(-1);

bzero(&newtio, sizeof(newtio));
newtio.c_cflag = B115200 | CRTSCTS | CS8 | CLOCAL | CREAD;
newtio.c_iflag = IGNPAR | ICRNL;
newtio.c_oflag = 0;
newtio.c_lflag = ICANON;
tcflush(fd, TCIFLUSH);
tcsetattr(fd,TCSANOW,&newtio);

Now that we have initialized the serial port reading and writing to it is as simple as reading and writing to a normal file:

char write_buffer[255];
int written_bytes = write(fd, write_buffer, strlen(write_buffer));
char read_buffer[255];
int read_bytes = read(fd, read_buffer, 255); 

Cool, but what happens behind the scenes?

The code seems very straightforward, except for the configuration of the serial port this code could be part of almost any other program. However there are things happening in the background that are important to understand. In this post we will look at the program side of things, this means we will not look at how a device driver works internally.

The first part of the code starts by opening the device file. This operation in itself is pretty similar to what happens when opening a file, the C Library issues a system call and the kernel will check with the device driver if the device is available and if so it will allocate a file descriptor so that we can communicate with it.

The next step will be to issue a series of system calls to configure the device driver to our needs. Thanks to the C Library we only issue one function call and the system calls will be executed for us.

When writing to the device driver the kernel will call the device driver with the data we provided and then it will put our program in the IO_WAIT scheduler queue. Our program will remain in this queue until the device driver tells the kernel that the write is completed, either successfully or with an error.

When reading from the device, the kernel will call the device driver and pass a buffer to copy the amount of data we requested and put our program into the IO_WAIT scheduler queue. Again, our program will remain in this queue until the device driver tells the kernel that the read is completed, either successfully or with an error.

Notice that in the case of a read, our program might remain suspended for a long while because the device has not received data and therefore the only possible solution is to wait.

It is of course possible to use asynchronous IO, in which case both the write and read calls will return immediately. We will still be taken out of the RUNNING queue of the scheduler and put back in afterwards because in the kernel eyes the operation takes a long time and we would block other processes if we were kept in the RUNNING queue.

What’s next?

In the next post we will examine the same interaction but this time seen from the point of view of the device driver. In the meantime you can leave your comments, share this post or contact me by using the contact form.

The so called userspace

Most people have heard the term userspace and most people have a grasp of what it means, but why is it called userspace and what is it? To understand the name we need to go back to the core and look at the architecture of a CPU.

Most processors, including most microcontrollers, have two modes of operation: privileged mode and non privileged mode. At first sight there is no big difference between them, a program written using the “normal” instructions will run the same in both modes, so why do we need the two modes?

The answer comes by looking at more than just the processing capabilities of a CPU. In a previous post (Computer memory from the ground up (part 1)), I mentioned that a CPU is connected to the world using connection pins and that those pins have different functions. A CPU needs to interact with devices that are external to the CPU core, for example memory, storage devices and other types of devices and buses.

When we implement an algorithm, we usually depend on the “normal” instructions of the CPU and do not need to bother to configure external devices.

A CPU can also be “called” by a device by what is called an interrupt. A device will signal the CPU that it requires attention, and the CPU will enter a special mode and run code that is capable of handling the device.

This special mode is called the privileged or supervisor mode, and in this mode in addition to the normal registers and instructions, there are special administrative registers and instructions that are only used to handle this kind of operations.

It is of course possible to run a normal program in supervisor mode and pay special attention to these instructions and registers. In some small microcontrollers there is no other option since they might have only one mode of operation.

Very cool, but what about userspace?

Now that we have talked about the two modes of operation, it is time to move on and explain some new names that come on top of this.

In order to optimize the use of resources in a computer system, there is an operative system which is in charge to provide a basic platform over which resources can be shared and to make sure that there is a uniform platform to access devices and other resources.

The operative system runs in the privileged mode and the programs run in the non privileged mode. Programs can interact with the operative system by issuing system calls.

Most operative systems in use today follow the monolithic model, in which there is a kernel that implements the supervisor functionality. This is opposed to the microkernel model in which the kernel implements only a subset of the functionality, usually just the scheduler, and the rest of the functionality is implemented by code running in non privileged mode. It is not the aim of this post to discuss whether one model is better than the other, just to mention them.

The kernel will start processes that perform the tasks the system is supposed to do, for example the browser you are using to read this post. The processes will run in the non privileged mode, and this is regardless of whether there is a user running the task or not. We can think of an automatic system that controls another system, for example a display showing the bus schedule. In this case there is no user that is starting the task, the system boots and runs a process that shows the schedule.

There is another dimension to consider, Unix operative systems have the notion of users embedded in the architecture, with the user root as the administrative user and other users as normal users. Notice that even the root user runs in the non-privileged mode of the CPU, even though it has all the privileges in the OS. A user in the Unix sense is basically a compartment to keep processes and resources separated.

However since all processes are executed by users, the name userspace is used to indicate the non-privileged mode. Notice that this non-privileged mode maps only partially to the non-privileged mode of the CPU, the OS might decide that some operations are not available for some users. For instance, it is possible to say that a file is readable by only some users and the OS will stop other users from reading the file.

Finally, the term userspace is usually connected to the memory. As explained in a previous post (Computer memory from the ground up (part 3)), the OS is responsible for assigning enough memory to a process. Most modern CPU have a memory management unit (MMU), which makes it impossible for a process to access memory that is not explicitly mapped in its memory space. Therefore when a process wants to send data to the OS or read data from the OS, it needs to ask the operative system to do it. This is done by a couple of methods, the names vary between operative systems but their functionality does not: copy_from_userspace, copy_to_userspace.

However, there are modern CPUs which do not have a MMU and operative systems that run on them, for example uCLinux and Coldfire processors. In this model of operation there is no difference between the kernel memory and the process memory, at least there is no easy way for the CPU to detect that a process is trying to access some restricted memory. In these cases the methods to move data from and to userspace are basically just no-ops.

What’s next

In my next post I will explore the interaction between a process, the kernel and a external device in order to show how the different pieces work together. In the meantime feel free to share this post, like it, comment it or contact me by using the contact form.

What you need to know about the C Library

Many people have heard about the C Library and for most its name inspire awe and a magic aura. In short, the C Library is the library that implements the runtime for the C programming language.

The C Library also provides an interface to manage the interaction between a process and the system. This is known as the interaction between userspace and kernel space.

The C Runtime

The C programming language does not require a complex runtime, because the language in itself does not have any complex features. By this I do not mean that the C language is not powerful, what I am saying is that the power of the C programming language comes from its simplicity as opposed to other programming languages which require a very complex runtime in order to be able to perform any operation.

As explained in my post https://blog.carlosware.com/2021/01/11/computer-memory-from-the-ground-up-part-2/, the C programming language only requires a stack to run. Dynamic memory is usually managed by the C Library using the malloc/free functions, but this is not a requirement.

This is a big advantage when using C to write low level code, for example a boot loader or an operating system.

A bootloader is a program that is run right after a processor is powered up and it is usually used to diagnose the hardware and load the operating system. In the modern Linux world the bootloader of choice is called GRUB.

Because the bootloader runs early and in supervisor mode, it has complete control over the hardware, including the memory. Therefore the bootloader can implement a simple memory management scheme, in which memory is simply used directly instead of having an allocator, for example:

unsigned char *buffer = (unsigned char *)0x00FF0000;

There is no need to have a more complicated system to use the memory, since the bootloader will be reading and writing to known locations without any interference.

Of course, there is more to the C Library than memory management. The functionality provided by the C Library includes the I/O system, including printf/scanf, the streaming file operations, the math functions, the time functions and a very long list of other functions.

Functions in the C Library are described in the section 3 of the Unix manual pages.

Interface between user and kernel space

It is technically feasible to interact with the kernel without using the C Library. The C Library implements the mechanism to call the system, a.k.a. system calls, by which a program is able to perform restricted operations.

The mechanism by which a process can issue a system call is specific to a particular architecture, but in general it consists of writing the code for the requested system call to a register together with the parameters for the desired system call and then issuing a specialized instruction to switch to supervisor mode.

There are plenty of system calls that have an implementation in the C Library, however there are a few system calls that do not have an implementation. For these cases, there is the possibility of using the function syscall(2) which will then perform the requested system call.

Many “day to day” operations are actually system calls and not strictly a part of the C Language, for example open(2), read(2), write(2) and close(2).

It is worth noting that memory allocation is performed without the use of system calls. System calls are only required to increase the data segment of the program.

The system calls implemented by the C Library can be found in section 2 of the Unix manual pages.

What’s next?

In my next post I will talk about userspace and kernel space, explaining among others why the term userspace is a really bad name.