backup: Syscalls and problems encountered

In the previous article, we finished setting up the kernel environment. Now let’s try to write a system call to modify or read the nice value of a given process and return the latest nice value and priority prio of the process.

Successful method

This is a successful method and seems to be the one most people use

First, go to include/linux/syscalls.h and add the following function prototype

image.png

After that, go to arch/x86/entry/syscalls/syscall_32.tbl and arch/x86/entry/syscalls/syscall_64.tbl respectively and add the system call number

image.png

ps:Make sure to add __ia32_ to syscall_32.tbl and __x64_ to syscall_64.tbl, just like the picture above, otherwise you may get an error like underfined reference to xxx.

Add code in kernel/sys.c

image.png

After that just make -j8 bzImage and wait for a few minutes

After compiling, write a demo to see if it works

image.png

Then use the script to make the rootfs.img and launch qemu

image.png

Let’s talk about this syscall

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
SYSCALL_DEFINE5(lab1, pid_t, pid, int, flag, int, nicevalue, void __user *, prio, void __user *, nice) {
struct pid * mypid;
struct task_struct * task;
int nice_before;
int nice_after;

// First, we use the function find_get_pid() here to get the struct_pid structure of the pid we are requesting
mypid = find_get_pid(pid);
// Then we use the pid_task() function to get the pid corresponding task_struct structure for task_nice() and set_user_nice()
task = pid_task(mypid, PIDTYPE_PID);
nice_before = task_nice(task); // Get the current nice value
if(flag == 1) {
set_user_nice(task, nicevalue); // Modify nice value
printk("This is origin nice : %d\nThis is the nice now : %d\n", nice_before, nicevalue);
}
else if(flag == 0) {
printk("The nice is : %d\n", nice_before);
}
/**
copy_to_user(void __user *to, const void *from, unsigned long n)
The three parameters are the user memory address, the kernel space address and the data length
* user memory address, use (int *)prio
* Kernel space address, mainly about task_struct structure, use &task_struct
* Data length is the length of this prio, sizeof(task_prio)
The same goes for the nice value returned later
**/
if(copy_to_user((int *)prio, &task->prio, sizeof(task->prio))) {
return EFAULT;
}
nice_after = task_nice(task);
if(copy_to_user((int *)nice, &nice_after, sizeof(nice_after))) {
return EFAULT;
}
return 0;
}

There seems to be a lot of questions though…. Didn’t think about the range of nice values when I first wrote it (didn’t even know nice values had a range).

It’s an error way that somehow goes wrong

At first, I wanted to follow a tutorial on the Internet and try to write a method with parameters, similar to the one I wrote in the last blog about adding system calls.

Create lab1_v2 folder in the root of the source code and add lab1_v2.c and Makefile

image.png

Modify the Makefile file in the root directory of the source code

image.png

Complete the regular three-step include/linux/syscalls.h, arch/x86/entry/syscalls/syscall_32.tbl and arch/x86/entry/syscalls/syscall_64.tbl

image.png
image.png

Then compile the kernel make -j8 bzImage

Written a demo as usual

image.png

Then an error occurs when running

image.png

Probably something went wrong when passing parameters?

find_get_pid(int nr)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
struct pid
{
atomic_t count; //Number of tasks currently using this process
unsigned int level;
struct hlist_head tasks[PIDTYPE_MAX]; //List of tasks that use this process
struct rcu_head rcu;
struct upid numbers[1];
};

struct upid
{
int nr;
struct pid_namespace *ns;
struct hlist_node pid_chain;
};

//find_get_pid(pid_t nr) is to get the process descriptor by the process number pid_t nr and add 1 to the count in the structure
struct pid *find_get_pid(pid_t nr)
{
struct pid *pid;
rcu_read_lock();
pid = get_pid(find_vpid(nr));
rcu_read_unlock();
return pid;
}

//其中,find_vpid(pid_t nr) returns the process descriptor, and get_pid(struct pid * kpid) adds 1 to count

static inline struct pid *get_pid(struct pid *pid)
{
if (pid)
atomic_inc(&pid->count);
return pid;
}

struct pid *find_vpid(int nr)
{
return find_pid_ns(nr, task_active_pid_ns(current));
}

pid_task()

1
2
3
4
5
6
7
8
9
10
11
12
13
//Find pid_task by process descriptor
struct task_struct *pid_task(struct pid *pid, enum pid_type type)
{
struct task_struct *result = NULL;
if (pid) {
struct hlist_node *first;
first = rcu_dereference_check(hlist_first_rcu(&pid->tasks[type]),
lockdep_tasklist_lock_is_held());
if (first)
result = hlist_entry(first, struct task_struct, pids[(type)].node);
}
return result;
}

asmlinkage
Inform the compiler to extract the function parameters from the stack only, not from the registers, because the system has already pressed the parameter values passed through the registers into the kernel stack before executing the service routine

set_user_nice()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
void set_user_nice(struct task_struct *p, long nice)
{
bool queued, running;
int old_prio, delta;
struct rq_flags rf;
struct rq *rq;
// rq is the ready queue, which is designed to be one ready queue per cpu, with local processes sorted on the local queue
// If the current task's nice value is already equal to the nice value to be set, just exit
// From here we can see that the nice values range from -20 to 19
if (task_nice(p) == nice || nice < MIN_NICE || nice > MAX_NICE)
return;
/*
* We have to be careful, if called from sys_setpriority(),
* the task might be in the middle of scheduling on another CPU.
*/
rq = task_rq_lock(p, &rf);
update_rq_clock(rq);
/*
* The RT priorities are set via sched_setscheduler(), but we still
* allow the 'normal' nice value to be set - but as expected
* it wont have any effect on scheduling until the task is
* SCHED_DEADLINE, SCHED_FIFO or SCHED_RR:
*/
// If the current process is a real-time process,
// the scheduling strategy for real-time processes can also be divided into deadline/fifo/rr.
// Setting the nice value for real-time processes is actually useless,
// but here it is still set to p->static_prio after converting the nice value to priority
if (task_has_dl_policy(p) || task_has_rt_policy(p)) {
p->static_prio = NICE_TO_PRIO(nice);// Change priority by edit nice value
goto out_unlock;
}
queued = task_on_rq_queued(p);
running = task_current(rq, p);
if (queued)
dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK);
if (running)
put_prev_task(rq, p);
// Set the nice value to priority in static_prio, #define NICE_TO_PRIO(nice) ((nice) + DEFAULT_PRIO)
// The DEFAULT_PRIO value here is calculated to be 120.
// From here you can also see that the priority-to-nice value should be subtracted from DEFAULT_PRIO
// #define PRIO_TO_NICE(prio) ((prio) - DEFAULT_PRIO)

p->static_prio = NICE_TO_PRIO(nice);
set_load_weight(p);
old_prio = p->prio;
p->prio = effective_prio(p);
delta = p->prio - old_prio;
// If the task of the nice to be set is in the queue
if (queued) {
enqueue_task(rq, p, ENQUEUE_RESTORE | ENQUEUE_NOCLOCK);
/*
* If the task increased its priority or is running and
* lowered its priority, then reschedule its CPU:
*/
// Reschedule rq if priority is increased and task is running or if priority is decreased.
if (delta < 0 || (delta > 0 && task_running(rq, p)))
resched_curr(rq);
}
// If the task to set the nice value is running, and since we are changing the priority of p here, reassign the task's rq.
if (running)
set_curr_task(rq, p);
out_unlock:
task_rq_unlock(rq, p, &rf);
}
Author

ACce1er4t0r

Posted on

2022-03-09

Updated on

2023-04-22

Licensed under