编程作业

思路

本实验要求为死锁和信号量机制实现死锁检测功能,并提供系统调用 enable_deadlock_detect,用以开启和关闭死锁检测功能。在开启死锁检测功能的情况下,用户使用 mutex_locksemaphore_down 尝试获取互斥资源时,如果发现系统处于不安全状态(可能发生死锁)时拒绝对应的资源获取请求。

实验手册中介绍的死锁检测算法为银行家算法(Banker's Algorithm),由 Dijkstra 提出,算法的流程可以参照手册,这里不再详细介绍,代码实现如下:

/// Banker's Algoritm for dead lock check
fn deadlock_check(available: Vec<usize>, allocation: Vec<Vec<usize>>, need: Vec<Vec<usize>>) -> bool {
    // n: thread count  m: resources count
    let (n, m) = (allocation.len(), allocation[0].len());
    let mut work = available;
    let mut finish = vec![false; n];
    loop {
        let mut idx = usize::MAX;
        for i in 0..n {
            let mut flag = true;
            if finish[i] {
                continue;
            }
            for j in 0..m {
                if need[i][j] > work[j] {
                    flag = false;
                    break;
                }
            }
            if flag {
                idx = i;
                break;
            }
        }
        // has found a thread meet the requirement
        if idx != usize::MAX {
            for j in 0..m {
                work[j] += allocation[idx][j];
            }
            finish[idx] = true;
        } else {
            break;
        }
    }
    finish.iter().all(|&x| x)
}

根据现有的 Available, Allocation 和 Need 来进行死锁的检测并不复杂,关键在于如何将这一算法融入现有的线程互斥机制中,更具体地说,如何维护内核中与此相关的状态,以便在进行死锁检测前能够正确构造出 Available, Allocation 和 Need 数据结构。

不妨先考虑更简单的情况:要想为锁机制实现死锁检测,如何维护其状态?首先需要保存当前线程需要哪把锁,需要明确的一点是:每个线程“需要”的资源只有 1 个,因为每个线程只有当所需的资源被满足后才会继续执行以获取更多资源,否则就会被阻塞。因此可以为线程控制块添加一个 usize 变量 mutex_need 来存储该线程当前需要锁资源的 id。除此之外,还需要一个向量 mutex_allocation 来存储线程已获取未释放锁资源的 id。

当使用 sys_mutex_lock(mutex_id) 尝试获取 mutex_id 的锁时,在使用 mutex.lock() 实际获取锁之前,将当前线程的 mutex_need 设置为 mutex_id,当线程成功获取锁资源后,将 mutex_id 放入 mutex_allocation 向量中,并将 mutex_need 设置为空(我这里采用的是将 usize::MAX 看作空,也可以使用 Option 类型,相对来说更优雅)。

当使用 sys_mutex_unlock(mutex_id) 尝试释放 mutex_id 的锁时,在使用 mutex.unlock() 实际释放锁之前,查找当前线程的 mutex_allocation 向量,移除值为 mutex_id 的元素。

实现上述逻辑后,就能在进行死锁检测前,根据所维护的信息将 Available, Allocation 和 Need 构造出来,作为银行家算法的参数,检测当前系统是否处于不安全状态,构造代码如下所示:

if process_inner.dlcheck_option {
    // initialize data structure for Banker's Algorithm:
    // Avavilable Vector, Allocation Matrix, Need Matrix
    let n = process_inner.tasks.len();
    let m = process_inner.mutex_list.len();
    let mut available: Vec<usize> = vec![1; m];
    let mut allocation: Vec<Vec<usize>> = vec![vec![0; m]; n];
    let mut need: Vec<Vec<usize>> = vec![vec![0; m]; n];
    for (i, task_opt) in process_inner.tasks.iter().enumerate() {
        match task_opt {
            Some(task) => {
                let task_inner = task.inner_exclusive_access();
                for mid in &task_inner.mutex_allocation {
                    allocation[i][*mid] += 1;
                    available[*mid] -= 1;
                }
                let nid = task_inner.mutex_need;
                if nid != usize::MAX {
                    need[i][nid] += 1;
                }
                drop(task_inner);
            }
            None => {}
        }
    }

    if !deadlock_check(available, allocation, need) {
        return -0xDEAD;
    }
}

信号量机制的实现大体相似,不过需要注意一些细节。

因为信号量的数量不再是二值的(有或没有),因此线程的资源分配向量中还需要包含每个信号量的数量,向量的元素可以选择用 <sem_id, cnt> 这样的二元组来表示,也可以用 cntsem_id 元素来表示这么一个二元组,我在这里采用的是前者。

另外,信号量还可以为负数,负数信号量的绝对值表示当前资源被提前“透支”的数量,而在银行家算法中,资源数量 Available[i][j] 不能为负数,此时应该将其看作 0。

available.push(max(sem_inner.count, 0) as usize);

最后,当线程 A 提前“透支”信号量进入休眠状态,线程 B 释放资源后调用 wakeup_task(task) 尝试唤醒线程 A 前,设置线程 A 的 sem_allocation 向量。否则可能由于线程调度的不确定性,导致线程 B 在被 sem_allocation 未被设置的情况下被调度,从而后续的死锁检测出现错误,让系统意外地进入死锁的状态。

有关这样做法的合理性原因,我也不是很确定,因为我在未遵守上述顺序的情况下,执行 ch8_deadlock_sem2 进入死锁的几率非常大(大约 90%),而这应该不全是线程调度的随机性所导致。

代码

diff --git a/os/src/sync/semaphore.rs b/os/src/sync/semaphore.rs
index 509b504..5743172 100644
--- a/os/src/sync/semaphore.rs
+++ b/os/src/sync/semaphore.rs
@@ -36,6 +36,15 @@ impl Semaphore {
         inner.count += 1;
         if inner.count <= 0 {
             if let Some(task) = inner.wait_queue.pop_front() {
+                let mut task_inner = task.inner_exclusive_access();
+                let sem_id = task_inner.sem_need;
+                match task_inner.sem_allocation.iter().position(|&x| x.0 == sem_id) {
+                    Some(index) => task_inner.sem_allocation[index].1 += 1,
+                    None => task_inner.sem_allocation.push((sem_id, 1)),
+                }
+                task_inner.sem_need = usize::MAX;
+                drop(task_inner);
+                
                 wakeup_task(task);
             }
         }
@@ -50,6 +59,17 @@ impl Semaphore {
             inner.wait_queue.push_back(current_task().unwrap());
             drop(inner);
             block_current_and_run_next();
+        } else {
+            let task = current_task().unwrap();
+            let mut task_inner = task.inner_exclusive_access();
+            let sem_id = task_inner.sem_need;
+            match task_inner.sem_allocation.iter().position(|&x| x.0 == sem_id) {
+                Some(index) => task_inner.sem_allocation[index].1 += 1,
+                None => task_inner.sem_allocation.push((sem_id, 1)),
+            }
+            task_inner.sem_need = usize::MAX;
+            drop(task_inner);
+            drop(task);
         }
     }
 }
diff --git a/os/src/syscall/process.rs b/os/src/syscall/process.rs
index 31fa22a..82c0991 100644
--- a/os/src/syscall/process.rs
+++ b/os/src/syscall/process.rs
@@ -1,11 +1,13 @@
+use core::mem::size_of;
+
 use crate::{
     config::MAX_SYSCALL_NUM,
     fs::{open_file, OpenFlags},
-    mm::{translated_ref, translated_refmut, translated_str},
+    mm::{translated_byte_buffer, translated_ref, translated_refmut, translated_str},
     task::{
         current_process, current_task, current_user_token, exit_current_and_run_next, pid2process,
         suspend_current_and_run_next, SignalFlags, TaskStatus,
-    },
+    }, timer::get_time_us,
 };
 use alloc::{string::String, sync::Arc, vec::Vec};
 
@@ -164,10 +166,24 @@ pub fn sys_kill(pid: usize, signal: u32) -> isize {
 /// HINT: What if [`TimeVal`] is splitted by two pages ?
 pub fn sys_get_time(_ts: *mut TimeVal, _tz: usize) -> isize {
     trace!(
-        "kernel:pid[{}] sys_get_time NOT IMPLEMENTED",
+        "kernel:pid[{}] sys_get_time",
         current_task().unwrap().process.upgrade().unwrap().getpid()
     );
-    -1
+    let _us = get_time_us();
+    let time_val = TimeVal {
+        sec: _us / 1_000_000,
+        usec: _us % 1_000_000,
+    };
+    let buffers = translated_byte_buffer(
+        current_user_token(), _ts as *const u8, size_of::<TimeVal>());
+    let mut time_val_ptr = &time_val as *const _ as *const u8;
+    for buffer in buffers {
+        unsafe {
+            time_val_ptr.copy_to(buffer.as_mut_ptr(), buffer.len());
+            time_val_ptr = time_val_ptr.add(buffer.len());
+        }
+    }
+    0
 }
 
 /// task_info syscall
diff --git a/os/src/syscall/sync.rs b/os/src/syscall/sync.rs
index 290ee6f..6d25e65 100644
--- a/os/src/syscall/sync.rs
+++ b/os/src/syscall/sync.rs
@@ -1,7 +1,11 @@
+use core::cmp::max;
+
 use crate::sync::{Condvar, Mutex, MutexBlocking, MutexSpin, Semaphore};
 use crate::task::{block_current_and_run_next, current_process, current_task};
 use crate::timer::{add_timer, get_time_ms};
 use alloc::sync::Arc;
+use alloc::vec;
+use alloc::vec::Vec;
 /// sleep syscall
 pub fn sys_sleep(ms: usize) -> isize {
     trace!(
@@ -71,9 +75,75 @@ pub fn sys_mutex_lock(mutex_id: usize) -> isize {
     let process = current_process();
     let process_inner = process.inner_exclusive_access();
     let mutex = Arc::clone(process_inner.mutex_list[mutex_id].as_ref().unwrap());
+
+    // set mutex_need to current
+    let task = current_task().unwrap();
+    let mut task_inner = task.inner_exclusive_access();
+    task_inner.mutex_need = mutex_id;
+    drop(task_inner);
+    drop(task);
+    
+    if process_inner.dlcheck_option {
+        // initialize data structure for Banker's Algorithm:
+        // Avavilable Vector, Allocation Matrix, Need Matrix
+        let n = process_inner.tasks.len();
+        let m = process_inner.mutex_list.len();
+        let mut available: Vec<usize> = vec![1; m];
+        let mut need: Vec<Vec<usize>> = vec![vec![0; m]; n];
+        let mut allocation: Vec<Vec<usize>> = vec![vec![0; m]; n];
+        for (i, task_opt) in process_inner.tasks.iter().enumerate() {
+            match task_opt {
+                Some(task) => {
+                    let task_inner = task.inner_exclusive_access();
+                    for mid in &task_inner.mutex_allocation {
+                        allocation[i][*mid] += 1;
+                        available[*mid] -= 1;
+                    }
+                    let nid = task_inner.mutex_need;
+                    if nid != usize::MAX {
+                        need[i][nid] += 1;
+                    }
+                    drop(task_inner);
+                }
+                None => {}
+            }
+        }
+
+        if !deadlock_check(available, allocation, need) {
+            return -0xDEAD;
+        }
+    }
+
+    
     drop(process_inner);
     drop(process);
     mutex.lock();
+
+    let task = current_task().unwrap();
+    let mut task_inner = task.inner_exclusive_access();
+    task_inner.mutex_allocation.push(mutex_id);
+    task_inner.mutex_need = usize::MAX;
+    drop(task_inner);
+    drop(task);
+
     0
 }
 /// mutex unlock syscall
@@ -89,11 +159,22 @@ pub fn sys_mutex_unlock(mutex_id: usize) -> isize {
             .unwrap()
             .tid
     );
+        
     let process = current_process();
     let process_inner = process.inner_exclusive_access();
     let mutex = Arc::clone(process_inner.mutex_list[mutex_id].as_ref().unwrap());
     drop(process_inner);
     drop(process);
+
+    let task = current_task().unwrap();
+    let mut task_inner = task.inner_exclusive_access();
+    if let Some(index) = task_inner.mutex_allocation.iter().position(|&x| x == mutex_id) {
+        task_inner.mutex_allocation.swap_remove(index);
+    }
+    drop(task_inner);
+    drop(task);
+    
     mutex.unlock();
     0
 }
@@ -142,10 +223,21 @@ pub fn sys_semaphore_up(sem_id: usize) -> isize {
             .unwrap()
             .tid
     );
+    
     let process = current_process();
     let process_inner = process.inner_exclusive_access();
     let sem = Arc::clone(process_inner.semaphore_list[sem_id].as_ref().unwrap());
     drop(process_inner);
+    
+    let task = current_task().unwrap();
+    let mut task_inner = task.inner_exclusive_access();
+    if let Some(index) = task_inner.sem_allocation.iter().position(|&x| x.0 == sem_id) {
+        task_inner.sem_allocation[index].1 -= 1;
+        if task_inner.sem_allocation[index].1 == 0 {
+        	task_inner.sem_allocation.swap_remove(index);
+        }
+    }
+    drop(task_inner);
+    drop(task);
+    
     sem.up();
     0
 }
@@ -165,8 +257,89 @@ pub fn sys_semaphore_down(sem_id: usize) -> isize {
     let process = current_process();
     let process_inner = process.inner_exclusive_access();
     let sem = Arc::clone(process_inner.semaphore_list[sem_id].as_ref().unwrap());
+
+    let task = current_task().unwrap();
+    let mut task_inner = task.inner_exclusive_access();
+    task_inner.sem_need = sem_id;
+    drop(task_inner);
+    drop(task);
+
+    if process_inner.dlcheck_option {
+        // initialize data structure for Banker's Algorithm:
+        // Avavilable Vector, Allocation Matrix, Need Matrix
+        let n = process_inner.tasks.len();
+        let m = process_inner.semaphore_list.len();
+        let mut available:Vec<usize> = Vec::new();
+        for sem_opt in &process_inner.semaphore_list {
+            match sem_opt {
+                Some(sem) => {
+                    let sem_inner = sem.inner.exclusive_access();
+                    available.push(max(sem_inner.count, 0) as usize);
+                    drop(sem_inner);
+                }
+                None => available.push(0),
+            }
+        }
+        let mut allocation: Vec<Vec<usize>> = vec![vec![0; m]; n];
+        for (i, task_opt) in process_inner.tasks.iter().enumerate() {
+            match task_opt {
+                Some(task) => {
+                    let task_inner = task.inner_exclusive_access();
+                    for (id, alloc) in &task_inner.sem_allocation {
+                        allocation[i][*id] += *alloc;
+                    }
+                    drop(task_inner);
+                }
+                None => {}
+            }
+        }
+        let mut need: Vec<Vec<usize>> = vec![vec![0; m]; n];
+        for (i, task_opt) in process_inner.tasks.iter().enumerate() {
+            match task_opt {
+                Some(task) => {
+                    let task_inner = task.inner_exclusive_access();
+                    let nid = task_inner.sem_need;
+                    if nid != usize::MAX {
+                        need[i][nid] += 1;
+                    }
+                    drop(task_inner);
+                }
+                None => {}
+            }
+        }
+
+        if !deadlock_check(available, allocation, need) {
+            return -0xDEAD;
+        }
+    }
+
+    
     drop(process_inner);
     sem.down();

     0
 }
 /// condvar create syscall
@@ -246,6 +419,81 @@ pub fn sys_condvar_wait(condvar_id: usize, mutex_id: usize) -> isize {
 ///
 /// YOUR JOB: Implement deadlock detection, but might not all in this syscall
 pub fn sys_enable_deadlock_detect(_enabled: usize) -> isize {
-    trace!("kernel: sys_enable_deadlock_detect NOT IMPLEMENTED");
-    -1
+    trace!("kernel: sys_enable_deadlock_detect");
+    let process = current_process();
+    let mut process_inner = process.inner_exclusive_access();
+    let mut flag = 0;
+    match _enabled {
+        0 => process_inner.dlcheck_option = false,
+        1 => process_inner.dlcheck_option = true,
+        _ => flag = -1,
+    }
+    drop(process_inner);
+    flag
 }
+/// Banker's Algoritm for dead lock check
+fn deadlock_check(available: Vec<usize>, allocation: Vec<Vec<usize>>, need: Vec<Vec<usize>>) -> bool {
+    // n: thread count  m: resources count
+    let (n, m) = (allocation.len(), allocation[0].len());
+    let mut work = available;
+    let mut finish = vec![false; n];
+    loop {
+        let mut idx = usize::MAX;
+        for i in 0..n {
+            let mut flag = true;
+            if finish[i] {
+                continue;
+            }
+            for j in 0..m {
+                if need[i][j] > work[j] {
+                    flag = false;
+                    break;
+                }
+            }
+            if flag {
+                idx = i;
+                break;
+            }
+        }
+        // has found a thread meet the requirement
+        if idx != usize::MAX {
+            for j in 0..m {
+                work[j] += allocation[idx][j];
+            }
+            finish[idx] = true;
+        } else {
+            break;
+        }
+    }
+    finish.iter().all(|&x| x)
+}
+
diff --git a/os/src/task/process.rs b/os/src/task/process.rs
index c2be1ce..471c63a 100644
--- a/os/src/task/process.rs
+++ b/os/src/task/process.rs
@@ -49,6 +49,8 @@ pub struct ProcessControlBlockInner {
     pub semaphore_list: Vec<Option<Arc<Semaphore>>>,
     /// condvar list
     pub condvar_list: Vec<Option<Arc<Condvar>>>,
+    /// deadlock check option
+    pub dlcheck_option: bool,
 }
 
 impl ProcessControlBlockInner {
@@ -119,6 +121,7 @@ impl ProcessControlBlock {
                     mutex_list: Vec::new(),
                     semaphore_list: Vec::new(),
                     condvar_list: Vec::new(),
+                    dlcheck_option: false,
                 })
             },
         });
@@ -245,6 +248,7 @@ impl ProcessControlBlock {
                     mutex_list: Vec::new(),
                     semaphore_list: Vec::new(),
                     condvar_list: Vec::new(),
+                    dlcheck_option: false,
                 })
             },
         });
diff --git a/os/src/task/task.rs b/os/src/task/task.rs
index 0136098..6818590 100644
--- a/os/src/task/task.rs
+++ b/os/src/task/task.rs
@@ -5,7 +5,9 @@ use super::{kstack_alloc, KernelStack, ProcessControlBlock, TaskContext};
 use crate::trap::TrapContext;
 use crate::{mm::PhysPageNum, sync::UPSafeCell};
 use alloc::sync::{Arc, Weak};
+use alloc::vec::Vec;
 use core::cell::RefMut;
+use core::usize;
 
 /// Task control block structure
 pub struct TaskControlBlock {
@@ -41,6 +43,14 @@ pub struct TaskControlBlockInner {
     pub task_status: TaskStatus,
     /// It is set when active exit or execution error occurs
     pub exit_code: Option<i32>,
+    /// The resources need of mutex
+    pub mutex_need: usize, // mutex id need (usize::MAX represents donot need any mutex)
+    /// The resources need of semaphore
+    pub sem_need: usize,   // semaphore id need (usize::MAX represents donot need any semaphore)
+    /// The resources allocated of mutex
+    pub mutex_allocation: Vec<usize>,         // elem: mutex id allocated
+    /// The resources allocated of semaphore
+    pub sem_allocation: Vec<(usize, usize)>,  // elem: (sem id allocated, count allocated)
 }
 
 impl TaskControlBlockInner {
@@ -75,6 +85,10 @@ impl TaskControlBlock {
                     task_cx: TaskContext::goto_trap_return(kstack_top),
                     task_status: TaskStatus::Ready,
                     exit_code: None,
+                    mutex_need: usize::MAX,
+                    sem_need: usize::MAX,
+                    mutex_allocation: Vec::new(),
+                    sem_allocation: Vec::new(),
                 })
             },
         }

问答作业

t1

在我们的多线程实现中,当主线程 (即 0 号线程) 退出时,视为整个进程退出, 此时需要结束该进程管理的所有线程并回收其资源。

Q1: 需要回收的资源有哪些?

A1: 线程控制块、线程栈、互斥锁、信号量、条件变量等。

Q2: 其他线程的 TaskControlBlock 可能在哪些位置被引用,分别是否需要回收,为什么?

A2: 线程同步原语中,例如 rCore 信号量的等待队列 wait_queue 中,它们需要被回收,否则可能导致死锁或其他并发问题。

t2

Q: 对比以下两种 Mutex.unlock 的实现,二者有什么区别?这些区别可能会导致什么问题?

impl Mutex for Mutex1 {
    fn unlock(&self) {
        let mut mutex_inner = self.inner.exclusive_access();
        assert!(mutex_inner.locked);
        mutex_inner.locked = false;
        if let Some(waking_task) = mutex_inner.wait_queue.pop_front() {
            add_task(waking_task);
        }
    }
}

impl Mutex for Mutex2 {
    fn unlock(&self) {
        let mut mutex_inner = self.inner.exclusive_access();
        assert!(mutex_inner.locked);
        if let Some(waking_task) = mutex_inner.wait_queue.pop_front() {
            add_task(waking_task);
        } else {
            mutex_inner.locked = false;
        }
    }
}

A:

对于 Mutex1,如果有多个线程在等待锁,并且 Mutex1 解锁后立即唤醒一个等待的线程,而此时另一个线程已经获取了锁,那么被唤醒的线程可能会发现锁已经被其他线程持有,从而再次进入等待状态。这会导致不必要的额外等待和上下文切换。例如,假设线程 A 持有锁并解锁,线程 B 和 C 都在等待队列中。线程 A 解锁后,mutex_inner.locked 被设置为 false,然后唤醒线程 B。但在线程 B 开始执行之前,线程 C 已经获取了锁。此时,线程 B 会发现锁已被线程 C 持有,不得不重新进入等待队列。

对于 Mutex2,如果它解锁时有等待的任务,mutex_inner.locked 不会被设置为 false。这意味着当被唤醒的任务开始执行时,它会看到 locked 仍然是 true,尽管锁实际上已经被释放了。这种情况可能导致被唤醒的任务无法正确获取锁,因为它会认为锁仍然被其他线程持有,从而再次进入等待队列。这会导致死锁或无限循环。