Post

Simpler Scheduler Design

Goal of the Scheduler


Minimize the total delay of the gNB while considering the priority of each attached UE

  • As analyzed before, one of the issues in the QoS scheduler is the unfairness of delay between DC-GBR and other types of traffic
  • To address this issue, the designed scheduler focuses on minimizing the total delay

In Terms of Usability

The scheduler’s design should be simple and easily understandable to facilitate ease of reuse

State (Input)


\[\begin{aligned} X= \begin{bmatrix} \overrightarrow{x_1} & \overrightarrow{x_2} & \cdots & \overrightarrow{x_n} \end{bmatrix} \\ \text{,where } \overrightarrow{x_i} = [\text{RNTI}_i, P_i, \text{HOL}_i] \end{aligned}\]

Each element of $i$ th UE’s state vector $\overrightarrow{x_i}$ is defined as follows:

  • $P_i$: The Default Level of Priority
  • $\text{HOL}_i$: Head of Line Delay
  • $\text{RNTI}_i$: Radio Network Temporary Identifier

Action (Output)


$W_t$ is the weights vector of $n$ UEs, each element $w_i$ indicates the weight of the $i$th UE

\[W_t=[w_1, w_2, \cdots, w_n]\]

Reward


The reward $r_i$ is received for each UE’s weight $w_i$.

Option 1

\[r_i= -(100-P_i)\times\text{HOL}_i\]

Advantages

  • Clear Penalty: Provides a clear penalty for high delay with higher prority, encouraging the network to optimize performance.
  • Intuitive Interpretation: Reflects that high delay with higher priority are undesirable, guiding the learning process towards performance optimization.

Disadvantages

  • Complexity of Negative Values: Negative rewards can make convergence difficult for some algorithms, complicating policy evaluation and updates.
  • Impact of Scale: Large negative values can slow down or destabilize learning

Option 2

\[r_i= \cfrac{1}{(100-P_i)\times\text{HOL}_i}\]

Advantages

  • Limited Range of Values: Rewards fall between 0 and 1, which can lead to more stable learning. This method rewards positive network states and actions.
  • Positive Reinforcement: All rewards are positive, promoting better actions through positive reinforcement.

Disadvantages

  • Non-linearity: Rewards change non-linearly, making them harder to interpret and less intuitive, especially in early learning stages.
  • Impact of Small Values: Rewards can become very small, slowing down learning and making the system highly sensitive to minor changes.

Total Reward

The total reward $R(X_t, W_t)$ of a gNB with the current input $X_t$ and output $W_t$ in slot time $t$ is as follows:

\[R(X_t,W_t) = \sum_{i=1}^n r_i \\\]
This post is licensed under CC BY 4.0 by the author.