Simpler Scheduler Design
Goal of the Scheduler
Minimize the total delay of the gNB while considering the priority of each attached UE
- As analyzed before, one of the issues in the QoS scheduler is the unfairness of delay between DC-GBR and other types of traffic
- To address this issue, the designed scheduler focuses on minimizing the total delay
In Terms of Usability
The scheduler’s design should be simple and easily understandable to facilitate ease of reuse
State (Input)
\[\begin{aligned} X= \begin{bmatrix} \overrightarrow{x_1} & \overrightarrow{x_2} & \cdots & \overrightarrow{x_n} \end{bmatrix} \\ \text{,where } \overrightarrow{x_i} = [\text{RNTI}_i, P_i, \text{HOL}_i] \end{aligned}\]
Each element of $i$ th UE’s state vector $\overrightarrow{x_i}$ is defined as follows:
- $P_i$: The Default Level of Priority
- $\text{HOL}_i$: Head of Line Delay
- $\text{RNTI}_i$: Radio Network Temporary Identifier
Action (Output)
$W_t$ is the weights vector of $n$ UEs, each element $w_i$ indicates the weight of the $i$th UE
\[W_t=[w_1, w_2, \cdots, w_n]\]Reward
The reward $r_i$ is received for each UE’s weight $w_i$.
Option 1
\[r_i= -(100-P_i)\times\text{HOL}_i\]✅ Advantages
- Clear Penalty: Provides a clear penalty for high delay with higher prority, encouraging the network to optimize performance.
- Intuitive Interpretation: Reflects that high delay with higher priority are undesirable, guiding the learning process towards performance optimization.
❌ Disadvantages
- Complexity of Negative Values: Negative rewards can make convergence difficult for some algorithms, complicating policy evaluation and updates.
- Impact of Scale: Large negative values can slow down or destabilize learning
Option 2
\[r_i= \cfrac{1}{(100-P_i)\times\text{HOL}_i}\]✅ Advantages
- Limited Range of Values: Rewards fall between 0 and 1, which can lead to more stable learning. This method rewards positive network states and actions.
- Positive Reinforcement: All rewards are positive, promoting better actions through positive reinforcement.
❌ Disadvantages
- Non-linearity: Rewards change non-linearly, making them harder to interpret and less intuitive, especially in early learning stages.
- Impact of Small Values: Rewards can become very small, slowing down learning and making the system highly sensitive to minor changes.
Total Reward
The total reward $R(X_t, W_t)$ of a gNB with the current input $X_t$ and output $W_t$ in slot time $t$ is as follows:
\[R(X_t,W_t) = \sum_{i=1}^n r_i \\\] This post is licensed under CC BY 4.0 by the author.