编程作业

思路

本实验要求为死锁和信号量机制实现死锁检测功能,并提供系统调用 enable_deadlock_detect,用以开启和关闭死锁检测功能。在开启死锁检测功能的情况下,用户使用 mutex_locksemaphore_down 尝试获取互斥资源时,如果发现系统处于不安全状态(可能发生死锁)时拒绝对应的资源获取请求。

实验手册中介绍的死锁检测算法为银行家算法(Banker\’s Algorithm),由 Dijkstra 提出,算法的流程可以参照手册,这里不再详细介绍,代码实现如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/// Banker's Algoritm for dead lock check
fn deadlock_check(available: Vec<usize>, allocation: Vec<Vec<usize>>, need: Vec<Vec<usize>>) -> bool {
// n: thread count m: resources count
let (n, m) = (allocation.len(), allocation[0].len());
let mut work = available;
let mut finish = vec![false; n];
loop {
let mut idx = usize::MAX;
for i in 0..n {
let mut flag = true;
if finish[i] {
continue;
}
for j in 0..m {
if need[i][j] > work[j] {
flag = false;
break;
}
}
if flag {
idx = i;
break;
}
}
// has found a thread meet the requirement
if idx != usize::MAX {
for j in 0..m {
work[j] += allocation[idx][j];
}
finish[idx] = true;
} else {
break;
}
}
finish.iter().all(|&x| x)
}

根据现有的 Available, Allocation 和 Need 来进行死锁的检测并不复杂,关键在于如何将这一算法融入现有的线程互斥机制中,更具体地说,如何维护内核中与此相关的状态,以便在进行死锁检测前能够正确构造出 Available, Allocation 和 Need 数据结构。

不妨先考虑更简单的情况:要想为锁机制实现死锁检测,如何维护其状态?首先需要保存当前线程需要哪把锁,需要明确的一点是:每个线程“需要”的资源只有 1 个,因为每个线程只有当所需的资源被满足后才会继续执行以获取更多资源,否则就会被阻塞。因此可以为线程控制块添加一个 usize 变量 mutex_need 来存储该线程当前需要锁资源的 id。除此之外,还需要一个向量 mutex_allocation 来存储线程已获取未释放锁资源的 id。

当使用 sys_mutex_lock(mutex_id) 尝试获取 mutex_id 的锁时,在使用 mutex.lock() 实际获取锁之前,将当前线程的 mutex_need 设置为 mutex_id,当线程成功获取锁资源后,将 mutex_id 放入 mutex_allocation 向量中,并将 mutex_need 设置为空(我这里采用的是将 usize::MAX 看作空,也可以使用 Option 类型,相对来说更优雅)。

当使用 sys_mutex_unlock(mutex_id) 尝试释放 mutex_id 的锁时,在使用 mutex.unlock() 实际释放锁之前,查找当前线程的 mutex_allocation 向量,移除值为 mutex_id 的元素。

实现上述逻辑后,就能在进行死锁检测前,根据所维护的信息将 Available, Allocation 和 Need 构造出来,作为银行家算法的参数,检测当前系统是否处于不安全状态,构造代码如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
if process_inner.dlcheck_option {
// initialize data structure for Banker's Algorithm:
// Avavilable Vector, Allocation Matrix, Need Matrix
let n = process_inner.tasks.len();
let m = process_inner.mutex_list.len();
let mut available: Vec<usize> = vec![1; m];
let mut allocation: Vec<Vec<usize>> = vec![vec![0; m]; n];
let mut need: Vec<Vec<usize>> = vec![vec![0; m]; n];
for (i, task_opt) in process_inner.tasks.iter().enumerate() {
match task_opt {
Some(task) => {
let task_inner = task.inner_exclusive_access();
for mid in &task_inner.mutex_allocation {
allocation[i][*mid] += 1;
available[*mid] -= 1;
}
let nid = task_inner.mutex_need;
if nid != usize::MAX {
need[i][nid] += 1;
}
drop(task_inner);
}
None => {}
}
}

if !deadlock_check(available, allocation, need) {
return -0xDEAD;
}
}

信号量机制的实现大体相似,不过需要注意一些细节。

因为信号量的数量不再是二值的(有或没有),因此线程的资源分配向量中还需要包含每个信号量的数量,向量的元素可以选择用 <sem_id, cnt> 这样的二元组来表示,也可以用 cntsem_id 元素来表示这么一个二元组,我在这里采用的是前者。

另外,信号量还可以为负数,负数信号量的绝对值表示当前资源被提前“透支”的数量,而在银行家算法中,资源数量 Available[i][j] 不能为负数,此时应该将其看作 0。

1
available.push(max(sem_inner.count, 0) as usize);

最后,当线程 A 提前“透支”信号量进入休眠状态,线程 B 释放资源后调用 wakeup_task(task) 尝试唤醒线程 A 前,设置线程 A 的 sem_allocation 向量。否则可能由于线程调度的不确定性,导致线程 B 在被 sem_allocation 未被设置的情况下被调度,从而后续的死锁检测出现错误,让系统意外地进入死锁的状态。

有关这样做法的合理性原因,我也不是很确定,因为我在未遵守上述顺序的情况下,执行 ch8_deadlock_sem2 进入死锁的几率非常大(大约 90%),而这应该不全是线程调度的随机性所导致。

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
diff --git a/os/src/sync/semaphore.rs b/os/src/sync/semaphore.rs
index 509b504..5743172 100644
--- a/os/src/sync/semaphore.rs
+++ b/os/src/sync/semaphore.rs
@@ -36,6 +36,15 @@ impl Semaphore {
inner.count += 1;
if inner.count <= 0 {
if let Some(task) = inner.wait_queue.pop_front() {
+ let mut task_inner = task.inner_exclusive_access();
+ let sem_id = task_inner.sem_need;
+ match task_inner.sem_allocation.iter().position(|&x| x.0 == sem_id) {
+ Some(index) => task_inner.sem_allocation[index].1 += 1,
+ None => task_inner.sem_allocation.push((sem_id, 1)),
+ }
+ task_inner.sem_need = usize::MAX;
+ drop(task_inner);
+
wakeup_task(task);
}
}
@@ -50,6 +59,17 @@ impl Semaphore {
inner.wait_queue.push_back(current_task().unwrap());
drop(inner);
block_current_and_run_next();
+ } else {
+ let task = current_task().unwrap();
+ let mut task_inner = task.inner_exclusive_access();
+ let sem_id = task_inner.sem_need;
+ match task_inner.sem_allocation.iter().position(|&x| x.0 == sem_id) {
+ Some(index) => task_inner.sem_allocation[index].1 += 1,
+ None => task_inner.sem_allocation.push((sem_id, 1)),
+ }
+ task_inner.sem_need = usize::MAX;
+ drop(task_inner);
+ drop(task);
}
}
}
diff --git a/os/src/syscall/process.rs b/os/src/syscall/process.rs
index 31fa22a..82c0991 100644
--- a/os/src/syscall/process.rs
+++ b/os/src/syscall/process.rs
@@ -1,11 +1,13 @@
+use core::mem::size_of;
+
use crate::{
config::MAX_SYSCALL_NUM,
fs::{open_file, OpenFlags},
- mm::{translated_ref, translated_refmut, translated_str},
+ mm::{translated_byte_buffer, translated_ref, translated_refmut, translated_str},
task::{
current_process, current_task, current_user_token, exit_current_and_run_next, pid2process,
suspend_current_and_run_next, SignalFlags, TaskStatus,
- },
+ }, timer::get_time_us,
};
use alloc::{string::String, sync::Arc, vec::Vec};

@@ -164,10 +166,24 @@ pub fn sys_kill(pid: usize, signal: u32) -> isize {
/// HINT: What if [`TimeVal`] is splitted by two pages ?
pub fn sys_get_time(_ts: *mut TimeVal, _tz: usize) -> isize {
trace!(
- "kernel:pid[{}] sys_get_time NOT IMPLEMENTED",
+ "kernel:pid[{}] sys_get_time",
current_task().unwrap().process.upgrade().unwrap().getpid()
);
- -1
+ let _us = get_time_us();
+ let time_val = TimeVal {
+ sec: _us / 1_000_000,
+ usec: _us % 1_000_000,
+ };
+ let buffers = translated_byte_buffer(
+ current_user_token(), _ts as *const u8, size_of::<TimeVal>());
+ let mut time_val_ptr = &time_val as *const _ as *const u8;
+ for buffer in buffers {
+ unsafe {
+ time_val_ptr.copy_to(buffer.as_mut_ptr(), buffer.len());
+ time_val_ptr = time_val_ptr.add(buffer.len());
+ }
+ }
+ 0
}

/// task_info syscall
diff --git a/os/src/syscall/sync.rs b/os/src/syscall/sync.rs
index 290ee6f..6d25e65 100644
--- a/os/src/syscall/sync.rs
+++ b/os/src/syscall/sync.rs
@@ -1,7 +1,11 @@
+use core::cmp::max;
+
use crate::sync::{Condvar, Mutex, MutexBlocking, MutexSpin, Semaphore};
use crate::task::{block_current_and_run_next, current_process, current_task};
use crate::timer::{add_timer, get_time_ms};
use alloc::sync::Arc;
+use alloc::vec;
+use alloc::vec::Vec;
/// sleep syscall
pub fn sys_sleep(ms: usize) -> isize {
trace!(
@@ -71,9 +75,75 @@ pub fn sys_mutex_lock(mutex_id: usize) -> isize {
let process = current_process();
let process_inner = process.inner_exclusive_access();
let mutex = Arc::clone(process_inner.mutex_list[mutex_id].as_ref().unwrap());
+
+ // set mutex_need to current
+ let task = current_task().unwrap();
+ let mut task_inner = task.inner_exclusive_access();
+ task_inner.mutex_need = mutex_id;
+ drop(task_inner);
+ drop(task);
+
+ if process_inner.dlcheck_option {
+ // initialize data structure for Banker's Algorithm:
+ // Avavilable Vector, Allocation Matrix, Need Matrix
+ let n = process_inner.tasks.len();
+ let m = process_inner.mutex_list.len();
+ let mut available: Vec<usize> = vec![1; m];
+ let mut need: Vec<Vec<usize>> = vec![vec![0; m]; n];
+ let mut allocation: Vec<Vec<usize>> = vec![vec![0; m]; n];
+ for (i, task_opt) in process_inner.tasks.iter().enumerate() {
+ match task_opt {
+ Some(task) => {
+ let task_inner = task.inner_exclusive_access();
+ for mid in &task_inner.mutex_allocation {
+ allocation[i][*mid] += 1;
+ available[*mid] -= 1;
+ }
+ let nid = task_inner.mutex_need;
+ if nid != usize::MAX {
+ need[i][nid] += 1;
+ }
+ drop(task_inner);
+ }
+ None => {}
+ }
+ }
+
+ if !deadlock_check(available, allocation, need) {
+ return -0xDEAD;
+ }
+ }
+
+
drop(process_inner);
drop(process);
mutex.lock();
+
+ let task = current_task().unwrap();
+ let mut task_inner = task.inner_exclusive_access();
+ task_inner.mutex_allocation.push(mutex_id);
+ task_inner.mutex_need = usize::MAX;
+ drop(task_inner);
+ drop(task);
+
0
}
/// mutex unlock syscall
@@ -89,11 +159,22 @@ pub fn sys_mutex_unlock(mutex_id: usize) -> isize {
.unwrap()
.tid
);
+
let process = current_process();
let process_inner = process.inner_exclusive_access();
let mutex = Arc::clone(process_inner.mutex_list[mutex_id].as_ref().unwrap());
drop(process_inner);
drop(process);
+
+ let task = current_task().unwrap();
+ let mut task_inner = task.inner_exclusive_access();
+ if let Some(index) = task_inner.mutex_allocation.iter().position(|&x| x == mutex_id) {
+ task_inner.mutex_allocation.swap_remove(index);
+ }
+ drop(task_inner);
+ drop(task);
+
mutex.unlock();
0
}
@@ -142,10 +223,21 @@ pub fn sys_semaphore_up(sem_id: usize) -> isize {
.unwrap()
.tid
);
+
let process = current_process();
let process_inner = process.inner_exclusive_access();
let sem = Arc::clone(process_inner.semaphore_list[sem_id].as_ref().unwrap());
drop(process_inner);
+
+ let task = current_task().unwrap();
+ let mut task_inner = task.inner_exclusive_access();
+ if let Some(index) = task_inner.sem_allocation.iter().position(|&x| x.0 == sem_id) {
+ task_inner.sem_allocation[index].1 -= 1;
+ if task_inner.sem_allocation[index].1 == 0 {
+ task_inner.sem_allocation.swap_remove(index);
+ }
+ }
+ drop(task_inner);
+ drop(task);
+
sem.up();
0
}
@@ -165,8 +257,89 @@ pub fn sys_semaphore_down(sem_id: usize) -> isize {
let process = current_process();
let process_inner = process.inner_exclusive_access();
let sem = Arc::clone(process_inner.semaphore_list[sem_id].as_ref().unwrap());
+
+ let task = current_task().unwrap();
+ let mut task_inner = task.inner_exclusive_access();
+ task_inner.sem_need = sem_id;
+ drop(task_inner);
+ drop(task);
+
+ if process_inner.dlcheck_option {
+ // initialize data structure for Banker's Algorithm:
+ // Avavilable Vector, Allocation Matrix, Need Matrix
+ let n = process_inner.tasks.len();
+ let m = process_inner.semaphore_list.len();
+ let mut available:Vec<usize> = Vec::new();
+ for sem_opt in &process_inner.semaphore_list {
+ match sem_opt {
+ Some(sem) => {
+ let sem_inner = sem.inner.exclusive_access();
+ available.push(max(sem_inner.count, 0) as usize);
+ drop(sem_inner);
+ }
+ None => available.push(0),
+ }
+ }
+ let mut allocation: Vec<Vec<usize>> = vec![vec![0; m]; n];
+ for (i, task_opt) in process_inner.tasks.iter().enumerate() {
+ match task_opt {
+ Some(task) => {
+ let task_inner = task.inner_exclusive_access();
+ for (id, alloc) in &task_inner.sem_allocation {
+ allocation[i][*id] += *alloc;
+ }
+ drop(task_inner);
+ }
+ None => {}
+ }
+ }
+ let mut need: Vec<Vec<usize>> = vec![vec![0; m]; n];
+ for (i, task_opt) in process_inner.tasks.iter().enumerate() {
+ match task_opt {
+ Some(task) => {
+ let task_inner = task.inner_exclusive_access();
+ let nid = task_inner.sem_need;
+ if nid != usize::MAX {
+ need[i][nid] += 1;
+ }
+ drop(task_inner);
+ }
+ None => {}
+ }
+ }
+
+ if !deadlock_check(available, allocation, need) {
+ return -0xDEAD;
+ }
+ }
+
+
drop(process_inner);
sem.down();

0
}
/// condvar create syscall
@@ -246,6 +419,81 @@ pub fn sys_condvar_wait(condvar_id: usize, mutex_id: usize) -> isize {
///
/// YOUR JOB: Implement deadlock detection, but might not all in this syscall
pub fn sys_enable_deadlock_detect(_enabled: usize) -> isize {
- trace!("kernel: sys_enable_deadlock_detect NOT IMPLEMENTED");
- -1
+ trace!("kernel: sys_enable_deadlock_detect");
+ let process = current_process();
+ let mut process_inner = process.inner_exclusive_access();
+ let mut flag = 0;
+ match _enabled {
+ 0 => process_inner.dlcheck_option = false,
+ 1 => process_inner.dlcheck_option = true,
+ _ => flag = -1,
+ }
+ drop(process_inner);
+ flag
}
+/// Banker's Algoritm for dead lock check
+fn deadlock_check(available: Vec<usize>, allocation: Vec<Vec<usize>>, need: Vec<Vec<usize>>) -> bool {
+ // n: thread count m: resources count
+ let (n, m) = (allocation.len(), allocation[0].len());
+ let mut work = available;
+ let mut finish = vec![false; n];
+ loop {
+ let mut idx = usize::MAX;
+ for i in 0..n {
+ let mut flag = true;
+ if finish[i] {
+ continue;
+ }
+ for j in 0..m {
+ if need[i][j] > work[j] {
+ flag = false;
+ break;
+ }
+ }
+ if flag {
+ idx = i;
+ break;
+ }
+ }
+ // has found a thread meet the requirement
+ if idx != usize::MAX {
+ for j in 0..m {
+ work[j] += allocation[idx][j];
+ }
+ finish[idx] = true;
+ } else {
+ break;
+ }
+ }
+ finish.iter().all(|&x| x)
+}
+
diff --git a/os/src/task/process.rs b/os/src/task/process.rs
index c2be1ce..471c63a 100644
--- a/os/src/task/process.rs
+++ b/os/src/task/process.rs
@@ -49,6 +49,8 @@ pub struct ProcessControlBlockInner {
pub semaphore_list: Vec<Option<Arc<Semaphore>>>,
/// condvar list
pub condvar_list: Vec<Option<Arc<Condvar>>>,
+ /// deadlock check option
+ pub dlcheck_option: bool,
}

impl ProcessControlBlockInner {
@@ -119,6 +121,7 @@ impl ProcessControlBlock {
mutex_list: Vec::new(),
semaphore_list: Vec::new(),
condvar_list: Vec::new(),
+ dlcheck_option: false,
})
},
});
@@ -245,6 +248,7 @@ impl ProcessControlBlock {
mutex_list: Vec::new(),
semaphore_list: Vec::new(),
condvar_list: Vec::new(),
+ dlcheck_option: false,
})
},
});
diff --git a/os/src/task/task.rs b/os/src/task/task.rs
index 0136098..6818590 100644
--- a/os/src/task/task.rs
+++ b/os/src/task/task.rs
@@ -5,7 +5,9 @@ use super::{kstack_alloc, KernelStack, ProcessControlBlock, TaskContext};
use crate::trap::TrapContext;
use crate::{mm::PhysPageNum, sync::UPSafeCell};
use alloc::sync::{Arc, Weak};
+use alloc::vec::Vec;
use core::cell::RefMut;
+use core::usize;

/// Task control block structure
pub struct TaskControlBlock {
@@ -41,6 +43,14 @@ pub struct TaskControlBlockInner {
pub task_status: TaskStatus,
/// It is set when active exit or execution error occurs
pub exit_code: Option<i32>,
+ /// The resources need of mutex
+ pub mutex_need: usize, // mutex id need (usize::MAX represents donot need any mutex)
+ /// The resources need of semaphore
+ pub sem_need: usize, // semaphore id need (usize::MAX represents donot need any semaphore)
+ /// The resources allocated of mutex
+ pub mutex_allocation: Vec<usize>, // elem: mutex id allocated
+ /// The resources allocated of semaphore
+ pub sem_allocation: Vec<(usize, usize)>, // elem: (sem id allocated, count allocated)
}

impl TaskControlBlockInner {
@@ -75,6 +85,10 @@ impl TaskControlBlock {
task_cx: TaskContext::goto_trap_return(kstack_top),
task_status: TaskStatus::Ready,
exit_code: None,
+ mutex_need: usize::MAX,
+ sem_need: usize::MAX,
+ mutex_allocation: Vec::new(),
+ sem_allocation: Vec::new(),
})
},
}

问答作业

t1

在我们的多线程实现中,当主线程 (即 0 号线程) 退出时,视为整个进程退出, 此时需要结束该进程管理的所有线程并回收其资源。

Q1: 需要回收的资源有哪些?

A1: 线程控制块、线程栈、互斥锁、信号量、条件变量等。

Q2: 其他线程的 TaskControlBlock 可能在哪些位置被引用,分别是否需要回收,为什么?

A2: 线程同步原语中,例如 rCore 信号量的等待队列 wait_queue 中,它们需要被回收,否则可能导致死锁或其他并发问题。

t2

Q: 对比以下两种 Mutex.unlock 的实现,二者有什么区别?这些区别可能会导致什么问题?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
impl Mutex for Mutex1 {
fn unlock(&self) {
let mut mutex_inner = self.inner.exclusive_access();
assert!(mutex_inner.locked);
mutex_inner.locked = false;
if let Some(waking_task) = mutex_inner.wait_queue.pop_front() {
add_task(waking_task);
}
}
}

impl Mutex for Mutex2 {
fn unlock(&self) {
let mut mutex_inner = self.inner.exclusive_access();
assert!(mutex_inner.locked);
if let Some(waking_task) = mutex_inner.wait_queue.pop_front() {
add_task(waking_task);
} else {
mutex_inner.locked = false;
}
}
}

A:

对于 Mutex1,如果有多个线程在等待锁,并且 Mutex1 解锁后立即唤醒一个等待的线程,而此时另一个线程已经获取了锁,那么被唤醒的线程可能会发现锁已经被其他线程持有,从而再次进入等待状态。这会导致不必要的额外等待和上下文切换。例如,假设线程 A 持有锁并解锁,线程 B 和 C 都在等待队列中。线程 A 解锁后,mutex_inner.locked 被设置为 false,然后唤醒线程 B。但在线程 B 开始执行之前,线程 C 已经获取了锁。此时,线程 B 会发现锁已被线程 C 持有,不得不重新进入等待队列。

对于 Mutex2,如果它解锁时有等待的任务,mutex_inner.locked 不会被设置为 false。这意味着当被唤醒的任务开始执行时,它会看到 locked 仍然是 true,尽管锁实际上已经被释放了。这种情况可能导致被唤醒的任务无法正确获取锁,因为它会认为锁仍然被其他线程持有,从而再次进入等待队列。这会导致死锁或无限循环。