Operating System: Threads
Hello, this is Halil Ibrahim. Today's topic is threads. Threads are one of the best parts of an operating system. Before I start, I want to give a quote for this post.
“The future belongs to those who believe in the beauty of their dreams.” Eleanor Roosevelt
And... let’s start.
In operating systems, a thread is the smallest unit of execution within a process. A process is an instance of a program that is being executed by the operating system, and it can have one or more threads. Each thread has its own program counter, stack, and register set, which allow it to execute code independently of other threads in the same process.
Threads are used to achieve concurrency within a process, allowing multiple tasks to be executed simultaneously. By using threads, a program can perform multiple operations concurrently, such as listening for user input while processing data in the background. This can improve the performance and responsiveness of the program, as well as make it more efficient by allowing multiple operations to be executed on a single processor core.
Let’s give a comprehensible example using threads. Almost everybody uses web servers. Have you thought about how web servers are managed?
A web server accepts client requests for web pages, images, sound, and so forth. A busy web server may have several (perhaps thousands of) clients concurrently accessing it. If the web server ran as a traditional single-threaded process, it would be able to service only one client at a time, and a client might have to wait a very long time for its request to be serviced.
One solution is to have the server run as a single process that accepts requests. When the server receives a request, it creates a separate process to service that request. In fact, this process-creation method was in common use before threads became popular. Process creation is time-consuming and resource intensive, however. If the new process will perform the same tasks as the existing process, why incur all that overhead? It is generally more efficient to use one process that contains multiple threads. If the web-server process is multithreaded, the server will create a separate thread that listens for client requests. When a request is made, rather than creating another process, the server creates a new thread to service the request and resumes listening for additional requests.
Benefits of Multithread Programming
- Responsiveness: Multithreading an interactive application may allow a program to continue running even if part of it is blocked or is performing a lengthy operation, thereby increasing responsiveness to the user.
- Resource sharing: Processes can share resources only through techniques such as shared memory and message passing. Such techniques must be explicitly arranged by the programmer. However, threads share the memory and the resources of the process to which they belong by default. The benefit of sharing code and data is that it allows an application to have several different threads of activity within the same address space.
- Economy: Allocating memory and resources for process creation is costly. Because threads share the resources of the process to which they belong, it is more economical to create and context-switch threads. Therefore, general thread creation consumes less time and memory than process creation.
- Scalability: The benefits of multithreading can be even greater in a multiprocessor architecture, where threads may be running in parallel on different processing cores. A single-threaded process can run on only one processor, regardless of how many are available. We explore this issue further in the following section.
Multicore Programming
Multicore programming refers to the process of developing software that can take advantage of the processing power provided by multiple processor cores within a single computer or device. In recent years, the number of processor cores in modern computers and devices has been steadily increasing, with some systems now having dozens or even hundreds of cores.
Multicore programming allows software to be written in a way that can distribute processing tasks across multiple cores, which can improve performance and reduce the time required to complete complex tasks. However, writing software that can effectively utilize multiple cores can be challenging, as it requires a different approach to programming than traditional single-threaded programs.
- Parallelism implies a system can perform more than one task simultaneously
- Concurrency supports more than one task making progress
Amdahal’s Law
Amdahl’s Law is a formula that identifies potential performance gains from adding additional computing cores to an application that has both serial (nonparallel) and parallel components. If S is the portion of the application that must be performed serially on a system with N processing cores, the formula appears as follows:
As an example, assume we have an application that is 75 percent parallel and 25 percent serial. If we run this application on a system with two processing cores, we can get a speedup of 1.6 times. If we add two additional cores (for a total of four), the speedup is 2.28 times. Below is a graph illustrating Amdahl’s Law in several different scenarios.
The sequential operation always has a lower value and is constant.
Types Of Parallelism
- Data Parallelism: Data parallelism focuses on distributing subsets of the same data across multiple computing cores and performing the same operation on each core.
- Task Parallelism: Task parallelism involves distributing not data but tasks (threads) across multiple computing cores. Each thread is performing a unique operation. Different threads may be operating on the same data, or they may be operating on different data.
User-level threads are managed entirely by the user-level thread library, without any support from the operating system kernel. This means that the thread library is responsible for managing thread creation, scheduling, and synchronization. User-level threads are typically lightweight and efficient, as they do not require kernel-level intervention for context switching. However, they can also be limited in their capabilities, as they may not have direct access to system resources such as I/O devices or the network.
Kernel-level threads, on the other hand, are managed directly by the operating system kernel. This means that the kernel is responsible for managing thread creation, scheduling, and synchronization. Kernel-level threads are typically more powerful and flexible, as they have direct access to system resources. However, they can also be less efficient than user-level threads, as they require more overhead for context switching.
Multithreading Models
- One-to-one Model: The one-to-one model maps each of the user threads to a kernel thread. This means that many threads can run in parallel on multiprocessors and other threads can run when one thread makes a blocking system call.
2. Many-to-One Model: The many-to-one model maps many of the user threads to a single kernel thread. This model is quite efficient as the user space manages the thread management.
A disadvantage of the many-to-one model is that a thread-blocking system call blocks the entire process. Also, multiple threads cannot run in parallel, as only one thread can access the kernel at a time.
3. Many-to-Many Model: The many-to-many model maps many of the user threads to an equal number or lesser number of kernel threads. The number of kernel threads depends on the application or machine.
The many-to-many model does not have the disadvantages of the one-to-one model or the many-to-one model. There can be as many user threads as required, and their corresponding kernel threads can run in parallel on a multiprocessor.
Pthreads in Linux
The C program demonstrates the basic Pthreads API for constructing a multithreaded program that calculates the summation of a non-negative integer in a separate thread. In a Pthreads program, separate threads begin execution of a specified function. In the code above, this is the runner() function. When this program begins, a single thread of control begins in main(). After some initialization, main() creates a second thread that begins controlling the runner() function. Both threads share the global data sum. Let’s look more closely at this program. All Pthreads programs must include the pthread.h header file. The statement pthread_ tid declares the identifier for the thread we will create. Each thread has a set of attributes, including stack size and scheduling information. The pthread_attr_t attr declaration represents the attributes for the thread. We set the attributes in the function call pthread_attr_init(&attr). Because we did not explicitly set any attributes, we use the default attributes provided. A separate thread is created with the pthread _create() function call. In addition to passing the thread identifier and the attributes for the thread, we also pass the name of the function where the new thread will begin execution — in this case, the runner() function. Last, we pass the integer parameter that was provided on the command line, argv[1]. At this point, the program has two threads: the initial (or parent) thread in main() and the summation (or child) thread performing the summation operation in the runner() function. This program follows the thread create/join strategy, whereby after creating the summation thread, the parent thread will wait for it to terminate by calling the pthread_join() function. The summation thread will terminate when it calls the function pthread_exit(). Once the summation thread has returned, the parent thread will output the value of the shared data sum.
Conclusion
- Threads are a fundamental concept in computer science that allows for concurrent execution of multiple tasks within a single process.
- They are lightweight, independent units of execution that share the same memory space.
- Threads enable parallelism and can improve the performance of applications by taking advantage of multiple CPU cores.
- They can communicate and synchronize with each other through various mechanisms like shared variables, locks, and semaphores. However, managing threads can be complex and prone to issues like race conditions and deadlocks, requiring careful design and synchronization techniques to ensure correct and efficient execution.
References
- https://www.tutorialspoint.com/multi-threading-models
- Operating System Concepts” by Silberschatz, Galvin, and Gagne is the 10th edition.