Programming Linux Modules
Kernel Module
Implemented modules
1 | //select_and_show.c |
1 | //show_all.c |
1 | obj-m := show_all.o |
Related Source Code
License statement for modules
MODULE_LICENSE("GPL");
1 | MODULE_LICENSE(_license) // _license is the license name string |
From kernel version 2.4.10 on, modules must declare the license of this module via the MODULE_LICENSE macro, otherwise you will receive a warning that the kernel is contaminated with “kernel tainted” when loading this module. As we can see from the linux/module.h file, the meaningful licenses accepted by the kernel are “GPL”, “GPL v2”, “GPL and additional rights”, “Dual BSD/GPL”, “Dual MPL/GPL”, “Proprietary “.
module_init (TODO)
Find the include\linux\init.h
file in the kernel source code directory
1 |
If this is a macro definition, then what is __initcall(x)
?
1 |
|
initcalls
1 |
We can see very many xxx_initcall
macro function definitions, they are all implemented by __define_initcall
. Inside __define_initcall
there are two parameters, one is fn and the other is id.
The function do_initcalls
can be found in the init\main.c
file
1 | static void __init do_initcalls(void) |
do_initcalls
seems to be mainly a for loop, which is executing some functions by level
.
So the question arises, what is level
and what function is executed, but this goes back to the above macro definition, first a simple wave of macro definition process
module_init(fn)---> __initcall(fn) ---> device_initcall(fn) ---> __define_initcall(fn, 6)
1 |
|
In the macro definition above, ##
can mean a connection, and __initcall_##fn##id
is __initcall_fnid
When fn
is helloworld
and id
is 4
, __initcall_##fn##id
is __initcall_helloworld4
A single #
symbol can be stringified, and #id
for "id"
TODO…
Parameters of printk
1 | // Emergency event message, prompted before a system crash, indicating that the system is unavailable |
task_struct
Status of the process
1 | volatile long state; /* -1为不可运行,0可以运行,大于0表示停止 */ |
The process in Linux consists of multiple states, and during operation, the process will switch in multiple situations with scheduling, and the information of the process is the basis for the process to make scheduling swaps
State | Meaning |
---|---|
TASK_RUNNING | Runnable |
TASK_INTERRUPTIBLE | Waiting |
TASK_UNINTERRUPTIBLE | Uninterruptible waiting |
TASK_ZOMBIE | Zombie |
TASK_STOPPED | Pause |
TASK_SWAPPING | Switching in/out |
Flags of the process
1 | unsigned int flags; /* per process flags, defined below */ |
Used by the kernel to identify the state of the current process for the next operation
Flag | Meaning |
---|---|
PF_FORKNOEXEC | The process has just been created and has not yet been executed |
PF_SUPERPRIV | Super User Privileges |
PF_DUMPCORE | Catching of exceptions |
PF_SIGNALED | Process killed by signal |
PF_EXIRING | The process begins to close |
Identifier of the process
1 | pid_t pid; //Identifier of the process |
Relatives between processes
1 | struct task_struct *real_parent; /* real parent process */ |
Processes are created with an inheritance relationship; a process can create multiple child processes, which are the parents of these child processes, and these child processes have a sibling relationship with each other.
When creating a child process, the child process inherits most of the information from the parent process, which means that the child process copies most of the information from the task_struct structure of the parent process, except for the pid, and thus the system needs to record these relatives in order to collaborate between processes.
The task_struct structure of each process contains a number of pointers that connect the task _struct structures of all the processes to form a process tree.
Relatives | Meaning |
---|---|
real_parent | real parent |
parent | parent process |
children | The head of the chain table, all elements of the chain table are its child processes |
sibling | Insert the current process into the sibling chain |
group_leader | Points to the first entry in its process group| |
ptrace system call
1 | unsigned int ptrace; |
The ptrace system call provides the ability for the parent process to observe and control the execution of the child process, and allows the parent process to check and replace the values of the child process’ kernel image (including registers).
Basic principle: When ptrace tracing is used, all signals sent to the traced child process are forwarded to the parent process, which is blocked. And after the parent process receives the signal, it can check and modify the stopped child process, and then let the child process continue to run. Please our common debugging tool gdb is based on ptrace to implement it.
Scheduling information of the process
1 | const struct sched_class *sched_class; |
sched_class: Scheduling Class
se: Calling entities for common processes, each process has one of these entities
rt: Real-time process call entities, each process has one of these entities
Process scheduling uses this information to determine a limited order of process execution, combined with process state information to ensure that processes run in a reasonable and orderly manner. Processes have various scheduling information, as follows.
Name | Meaning | Usage |
---|---|---|
SCHED_OTHER | Other scheduling methods | Normal process |
SCHED_FIFO | First in first out | Real-time processes |
SCHED_RR | Round-Robin | Real-time processes |
Priority of the process
1 | int prio, static_prio, normal_prio; |
Name | Priority |
---|---|
prio | Dynamic Priority |
static_prio | Static Priority |
normal_prio | Normal Priority |
rt_prio | Real-time Priority |
- The value of prio is the final priority value used by the scheduler, i.e., the value actually chosen by the scheduler when selecting a process. The prio smaller, the process’s priority higher. prio values range from 0 to MAX_PRIO, i.e., 0 to 139 (including 0 and 139), and can be divided into two intervals depending on the scheduling strategy, where the interval 0 to 99 is for real-time processes and non-real-time processes in the range of 100~139.
- static_prio static priority will not change over time, the kernel will not actively modify it, but only through the system call nice to modify static_prio, and the static priority calculation formula is
static_prio = MAX_RT_PRIO + nice +20
. The value of MAX_RT_PRIO is 100, and the range of nice range is -20 to +19, so the static_prio value ranges from 100 to 139. The smaller the value of static_prio, the higher the static priority of the process. - The value of normal_prio depends on the static priority and scheduling policy and can be set by the _setscheduler function. For non-real-time processes, the value of normal_prio is equal to the static priority value static_prio; for real-time processes, normal_prio = MAX_RT_PRIO-1 - p->rt_priority.
- The rt_priority value ranges from 0 ~ 99 and is only valid for real-time processes. From the equation:
prio = MAX_RT_PRIO-1 - p->rt_priority;
it can be seen that the larger the value of rt_priority, the smaller the value of prio, so the larger the value of real time priority (rt_priority) means the higher the priority of the process.
Time data information
1 | cputime_t utime, stime, utimescaled, stimescaled; |
Name | Meaning |
---|---|
utime/stime | Record the timers passed by the process in user/kernel state |
utimescaled/stimescaled | Record the runtime of the process in user/kernel state |
gtime | Virtual machine time counted in beats |
prev_utime/prev_stime | Previous running time |
nvcsw/nivcsw | Voluntary/Involuntary Context Switching Count |
start_time/real_start_time | Process creation time / the latter includes sleep time| |
cputime_expires | Count the processor time of a process or process group being tracked| |
|min_flt, maj_flt | Missing page statistics| |
Communication between processes
1 |
|
If multiple processes are performing collaboration on a task, then it is necessary that these incoming processes can access each other’s resources and communicate with each other.
The main process communication methods in Linux are:
- pipes
- semaphores
- shared memory
- signals
- message queues
File Information
1 | /* file system info */ |
define | Meaning |
---|---|
struct fs_struct *fs | Processes can be executed on the system where they affect |
struct files_struct *files | Files opened by the process |
Processes can open or close files, which are system resources, and the Linux kernel has to keep a record of how the process uses the files.
There are two data structures in the task_struct structure to describe the information related to the process pre-file.
The fs _struct describes two VFS index nodes, called root and pwd, which point to the root and current or working directories corresponding to the process’s executable impact, respectively.
The file _struct structure is used to record the descriptors of the files opened by the process.
Signal processing information
1 | struct signal_struct *signal; |
name | Meaning |
---|---|
signal | Signal descriptor pointing to the process |
sighand | Signal handler descriptor pointing to the process |
blocked | Indicates the mask of the blocked signal, real_blocked indicates a temporary mask |
pending | Data structure for storing private pending signals |
saa_ss_sp | Alternate stack address for signal handlers, ass_ss_size indicates the stack size |
notifier_data/notifier_mask | The device driver uses the function pointed to by the notifier to block certain semaphores of the process. notifier_data is the data that may be used by the function pointed to by the notifier |
虚拟内存处理
1 | struct mm_struct *mm, *active_mm; |
define | Meaning |
---|---|
struct mm_struct *mm | Describe the address space of the process |
struct mm_struct *activa_mm | Address space borrowed by kernel threads |
mm_struct is used to describe the address space (virtual space) of each process. active_mm is introduced for kernel threads, because kernel threads do not have their own address space. In order to make kernel threads have a uniform context switch with ordinary processes, when a kernel thread makes a context switch, let the active_mm of the switched-in thread point to the active_mm of the process that has just been dispatched out.
Page management information
When there is not enough physical memory, the Linux memory management system needs to transfer some pages from memory to external memory, and the swap is done on a page-by-page basis.
define | Meaning |
---|---|
int swappable | Whether the memory pages occupied by the process can be swapped out |
unsigned long min_flat, maj_flt, nswap | The accumulated number of missing pages, the master count and the accumulated number of pages swapped out and in of the process |
unsigned long cmin_flat, cnswap | Cumulative number of sub-page misses, pages swapped in, for this process as an ancestor process, for all its hierarchical child processes| |
Process Queue Pointer
struct task_struct *next_task, *prev_task;
// All processes (in the form of PCBs) form a two-way chain. next_task and prev_task are the front and back pointers to the chain. The head and tail of the chain are init_task (i.e. process 0).struct task_struct *next_run, *prev_run;
// The run_queue is a two-way circular chain of processes that are running or can be run with the process status TASK_RUNNING. The front and back pointers of the chain are next_run and prev_run, and the head and tail of the chain are both init_task (i.e. process 0).struct task_struct *p_opptr, *p_pptr;
和struct task_struct *p_cptr, *p_ysptr, *p_osptr;
// The above are pointers to the original parent, parent, youngest child, and newer and older sibling processes respectively.
TODO…
init_task
init_task is the first process of the kernel, process number 0, which becomes idle process when the initialization of the kernel is completed
init_task is a task_struct prototype for all processes and threads in the kernel. During kernel initialization, a task_struct interface is constructed by static definition, named init_task, and then a new kernel init thread, kthreadd kernel thread, is created by the rest_init() function later in the kernel initialization
The kernel init thread, which eventually executes the /sbin/init process, becomes the root process of all user state programs (as shown by the pstree command), i.e. the user space init process
The first init is a kernel thread created by kthread_thread, which, after initialization, moves to user space and generates the ancestors of all user processes
kernel kthreadd kernel thread, becomes the parent of all other daemon threads in the kernel state.
Its task is to manage and schedule other kernel threads kernel_thread, which loops through a kthread function that runs the kthreads maintained in the kthread_create_list global chain, and the kernel threads created when we call kernel_thread are added to this chain, so all kernel threads are directly or indirectly parented to kthreadd
The kernel will use the init_task as its task_struct structure descriptor, and when the system has nothing else to do, it will schedule its execution. At this point, the kernel will become an idle process, giving up the CPU and putting itself to sleep in a continuous loop.
Initialization of the stack
The process init_task
is defined in init/init_task.c
1 | /* Initial task structure */ |
The macro for INIT_TASK
is defined in include/linux/init_task.h
1 |
|
We can see that the stack of the init_task
process is pointing to the init_thread_info
In the file arch/arm/include/asm/thread_info.h
, init_thread_info
is defined as follows
1 |
init_thread_info
is a member of thread_info
of init_thread_union
Variable init_thread_info
is defined in init/init_task.c
.
1 | union thread_union init_thread_union __init_task_data = |
- Declares the
init_thread_union
variable of typethread_union
, then assigns values to thethread_info
member ofinit_thread_union
, mainlyinit_thread_union.thread_info thread_info. task=&init_task
, pointing thetask
member of this variable toinit_task
. attribute((section(".data...init_task"))
, specifying that thesection
name is.data...init_task
, will be compiled intovmlinux
at the beginning of.data
.
Stack compilation into vmlinux (TODO)
1 | 243 .data : AT(__data_loc) { |
TODO
for_each_process(p)
1 |
Start with init_task and iterate through all processes
Linux interlinks the task structures of all processes into a circular bidirectional chain, like (&init_task)->next ! = &init_task goes on and on
pid_task()
1 | struct task_struct *pid_task(struct pid *pid, enum pid_type type) |
module_param()
For the explanation of the three parameters of module_param.
module_param(worldNum,int,0644);
The first parameter is the name of the parameter, defined by yourself
The second parameter is the type of the variable, such as int, long, char, float, etc.
The third parameter is the permission, similar to the permission of a file. Here it should mean which users can modify the meaning of this parameter.
1 | * @perm is 0 if the the variable is not to appear in sysfs, or 0444 |
Translated with www.DeepL.com/Translator (free version)
list_for_each()
1 | /** |
list_entry()
1 |
|
Programming Linux Modules