[PATCH 15/15] sched: add sched_dl documentation.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


From: Dario Faggioli <raistlin@xxxxxxxx>

Add in Documentation/scheduler/ some hints about the design
choices, the usage and the future possible developments of the
sched_dl scheduling class and of the SCHED_DEADLINE policy.

Signed-off-by: Dario Faggioli <raistlin@xxxxxxxx>
Signed-off-by: Juri Lelli <juri.lelli@xxxxxxxxx>
---
 Documentation/scheduler/sched-deadline.txt |  164 ++++++++++++++++++++++++++++
 kernel/sched/dl.c                          |    3 +-
 2 files changed, 166 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/scheduler/sched-deadline.txt

diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt
new file mode 100644
index 0000000..d4dcfc7
--- /dev/null
+++ b/Documentation/scheduler/sched-deadline.txt
@@ -0,0 +1,164 @@
+			Deadline Task and Group Scheduling
+			----------------------------------
+
+CONTENTS
+========
+
+0. WARNING
+1. Overview
+2. Task scheduling
+2. The Interface
+3. Bandwidth management
+  3.1 System-wide settings
+  2.2 Task interface
+  2.4 Default behavior
+3. Future plans
+
+
+0. WARNING
+==========
+
+ Fiddling with these settings can result in an unpredictable or even unstable
+ system behavior. As for -rt (group) scheduling, it is assumed that root users
+ know what they're doing.
+
+
+1. Overview
+===========
+
+ The SCHED_DEADLINE policy contained inside the sched_dl scheduling class is
+ basically an implementation of the Earliest Deadline First (EDF) scheduling
+ algorithm, augmented with a mechanism (called Constant Bandwidth Server, CBS)
+ that makes it possible to isolate the behavior of tasks between each other.
+
+
+2. Task scheduling
+==================
+
+ The typical -deadline task is composed of a computation phase (instance)
+ which is activated on a periodic or sporadic fashion. The expected (maximum)
+ duration of such computation is called the task's runtime; the time interval
+ by which each instance needs to be completed is called the task's relative
+ deadline. The task's absolute deadline is dynamically calculated as the
+ time instant a task (or, more properly) activates plus the relative
+ deadline.
+
+ The EDF[1] algorithm selects the task with the smallest absolute deadline as
+ the one to be executed first, while the CBS[2,3] ensures that each task runs
+ for at most its runtime every period, avoiding any interference between
+ different tasks (bandwidth isolation).
+ Thanks to this feature, also tasks that do not strictly comply with the
+ computational model described above can effectively use the new policy.
+ IOW, there are no limitations on what kind of task can exploit this new
+ scheduling discipline, even if it must be said that it is particularly
+ suited for periodic or sporadic tasks that need guarantees on their
+ timing behavior, e.g., multimedia, streaming, control applications, etc.
+
+ References:
+  1 - C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogram-
+      ming in a hard-real-time environment. Journal of the Association for
+      Computing Machinery, 20(1), 1973.
+  2 - L. Abeni , G. Buttazzo. Integrating Multimedia Applications in Hard
+      Real-Time Systems. Proceedings of the 19th IEEE Real-time Systems
+      Symposium, 1998. http://retis.sssup.it/~giorgio/paps/1998/rtss98-cbs.pdf
+  3 - L. Abeni. Server Mechanisms for Multimedia Applications. ReTiS Lab
+      Technical Report. http://xoomer.virgilio.it/lucabe72/pubs/tr-98-01.ps
+
+3. Bandwidth management
+=======================
+
+ In order for the -deadline scheduling to be effective and useful, it is
+ important to have some method to keep the allocation of the available CPU
+ bandwidth to the tasks under control.
+ This is usually called "admission control" and if it is not performed at all,
+ no guarantee can be given on the actual scheduling of the -deadline tasks.
+
+ Since when RT-throttling has been introduced each task group has a bandwidth
+ associated, calculated as a certain amount of runtime over a period.
+ Moreover, to make it possible to manipulate such bandwidth, readable/writable
+ controls have been added to both procfs (for system wide settings) and cgroupfs
+ (for per-group settings).
+ Therefore, the same interface is being used for controlling the bandwidth
+ distrubution to -deadline tasks and task groups, i.e., new controls but with
+ similar names, equivalent meaning and with the same usage paradigm are added.
+
+ However, more discussion is needed in order to figure out how we want to manage
+ SCHED_DEADLINE bandwidth at the task group level. Therefore, SCHED_DEADLINE
+ uses (for now) a less sophisticated, but actually very sensible, mechanism to
+ ensure that a certain utilization cap is not overcome per each root_domain.
+
+ Another main difference between deadline bandwidth management and RT-throttling
+ is that -deadline tasks have bandwidth on their own (while -rt ones don't!),
+ and thus we don't need an higher level throttling mechanism to enforce the
+ desired bandwidth.
+
+3.1 System wide settings
+------------------------
+
+The system wide settings are configured under the /proc virtual file system:
+
+ The per-group controls that are added to the cgroupfs virtual file system are:
+  * /proc/sys/kernel/sched_dl_runtime_us,
+  * /proc/sys/kernel/sched_dl_period_us,
+
+ They accept (if written) and provides (if read) the new runtime and period,
+ respectively, for each CPU in each root_domain.
+
+ This means that, for a root_domain comprising M CPUs, -deadline tasks
+ can be created until the sum of their bandwidths stay below:
+
+   M * (sched_dl_runtime_us / sched_dl_period_us)
+
+ It is also possible to disable this bandwidth management logic, and
+ be thus free of oversubscribing the system up to any arbitrary level.
+ This is done by writing -1 in /proc/sys/kernel/sched_dl_runtime_us.
+
+
+2.2 Task interface
+------------------
+
+ Specifying a periodic/sporadic task that executes for a given amount of
+ runtime at each instance, and that is scheduled according to the urgency of
+ its own timing constraints needs, in general, a way of declaring:
+  - a (maximum/typical) instance execution time,
+  - a minimum interval between consecutive instances,
+  - a time constraint by which each instance must be completed.
+
+ Therefore:
+  * a new struct sched_param2, containing all the necessary fields is
+    provided;
+  * the new scheduling related syscalls that manipulate it, i.e.,
+    sched_setscheduler2(), sched_setparam2() and sched_getparam2()
+    are implemented.
+
+
+2.4 Default behavior
+---------------------
+
+The default values for SCHED_DEADLINE bandwidth is to have dl_runtime and
+dl_period equal to 500000 and 1000000, respectively. This means -deadline
+tasks can use at most 5%, multiplied by the number of CPUs that compose the
+root_domain, for each root_domain.
+
+When a -deadline task fork a child, its dl_runtime is set to 0, which means
+someone must call sched_setscheduler2() on it, or it won't even start.
+
+
+3. Future plans
+===============
+
+Still missing:
+
+ - refinements to deadline inheritance, especially regarding the possibility
+   of retaining bandwidth isolation among non-interacting tasks. This is
+   being studied from both theoretical and practical point of views, and
+   hopefully we should be able to produce some demonstrative code soon;
+ - (c)group based bandwidth management, and maybe scheduling;
+ - access control for non-root users (and related security concerns to
+   address), which is the best way to allow unprivileged use of the mechanisms
+   and how to prevent non-root users "cheat" the system?
+
+As already discussed, we are planning also to merge this work with the EDF
+throttling patches [https://lkml.org/lkml/2010/2/23/239] but we still are in
+the preliminary phases of the merge and we really seek feedback that would help us
+decide on the direction it should take.
diff --git a/kernel/sched/dl.c b/kernel/sched/dl.c
index 941a83a..3e28ed3 100644
--- a/kernel/sched/dl.c
+++ b/kernel/sched/dl.c
@@ -346,7 +346,8 @@ static void replenish_dl_entity(struct sched_dl_entity *dl_se,
  * disrupting the schedulability of the system. Otherwise, we should
  * refill the runtime and set the deadline a period in the future,
  * because keeping the current (absolute) deadline of the task would
- * result in breaking guarantees promised to other tasks.
+ * result in breaking guarantees promised to other tasks (refer to
+ * Documentation/scheduler/sched-deadline.txt for more informations).
  *
  * This function returns true if:
  *
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Other Archives]     [Linux Kernel Newbies]     [Linux Driver Development]     [Fedora Kernel]     [Linux Kernel Testers]     [Linux SH]     [Linux Omap]     [Linux Kbuild]     [Linux Tape]     [Linux Input]     [Linux Kernel Janitors]     [Linux Kernel Packagers]     [Linux Doc]     [Linux Man Pages]     [Linux API]     [Linux Memory Management]     [Linux Modules]     [Linux Standards]     [Kernel Announce]     [Netdev]     [Git]     [Linux PCI]     Linux CAN Development     [Linux I2C]     [Linux RDMA]     [Linux NUMA]     [Netfilter]     [Netfilter Devel]     [SELinux]     [Bugtraq]     [FIO]     [Linux Perf Users]     [Linux Serial]     [Linux PPP]     [Linux ISDN]     [Linux Next]     [Kernel Stable Commits]     [Linux Tip Commits]     [Kernel MM Commits]     [Linux Security Module]     [AutoFS]     [Filesystem Development]     [Ext3 Filesystem]     [Linux bcache]     [Ext4 Filesystem]     [Linux BTRFS]     [Linux CEPH Filesystem]     [Linux XFS]     [XFS]     [Linux NFS]     [Linux CIFS]     [Ecryptfs]     [Linux NILFS]     [Linux Cachefs]     [Reiser FS]     [Initramfs]     [Linux FB Devel]     [Linux OpenGL]     [DRI Devel]     [Fastboot]     [Linux RT Users]     [Linux RT Stable]     [eCos]     [Corosync]     [Linux Clusters]     [LVS Devel]     [Hot Plug]     [Linux Virtualization]     [KVM]     [KVM PPC]     [KVM ia64]     [Linux Containers]     [Linux Hexagon]     [Linux Cgroups]     [Util Linux]     [Wireless]     [Linux Bluetooth]     [Bluez Devel]     [Ethernet Bridging]     [Embedded Linux]     [Barebox]     [Linux MMC]     [Linux IIO]     [Sparse]     [Smatch]     [Linux Arch]     [x86 Platform Driver]     [Linux ACPI]     [Linux IBM ACPI]     [LM Sensors]     [CPU Freq]     [Linux Power Management]     [Linmodems]     [Linux DCCP]     [Linux SCTP]     [ALSA Devel]     [Linux USB]     [Linux PA RISC]     [Linux Samsung SOC]     [MIPS Linux]     [IBM S/390 Linux]     [ARM Linux]     [ARM Kernel]     [ARM MSM]     [Tegra Devel]     [Sparc Linux]     [Linux Security]     [Linux Sound]     [Linux Media]     [Video 4 Linux]     [Linux IRDA Users]     [Linux for the blind]     [Linux RAID]     [Linux ATA RAID]     [Device Mapper]     [Linux SCSI]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Linux IDE]     [Linux SMP]     [Linux AXP]     [Linux Alpha]     [Linux M68K]     [Linux ia64]     [Linux 8086]     [Linux x86_64]     [Linux Config]     [Linux Apps]     [Linux MSDOS]     [Linux X.25]     [Linux Crypto]     [DM Crypt]     [Linux Trace Users]     [Linux Btrace]     [Linux Watchdog]     [Utrace Devel]     [Linux C Programming]     [Linux Assembly]     [Dash]     [DWARVES]     [Hail Devel]     [Linux Kernel Debugger]     [Linux gcc]     [Gcc Help]     [X.Org]     [Wine]

Add to Google Powered by Linux

[Older Kernel Discussion]     [Yosemite National Park Forum]     [Large Format Photos]     [Gimp]     [Yosemite Photos]     [Stuff]