Show More
Commit Description:
- add isolate...
Commit Description:
- add isolate
- more comment and output for each script
References:
File last commit:
Show/Diff file:
Action:
isolate/isolate.1.txt
| 348 lines
| 13.6 KiB
| text/plain
| TextLexer
|
r256 | ISOLATE(1) | |||
========== | ||||
NAME | ||||
---- | ||||
isolate - Isolate a process using Linux Containers | ||||
SYNOPSIS | ||||
-------- | ||||
*isolate* 'options' *--init* | ||||
*isolate* 'options' *--run* +--+ 'program' 'arguments' | ||||
*isolate* 'options' *--cleanup* | ||||
DESCRIPTION | ||||
----------- | ||||
Run 'program' within a sandbox, so that it cannot communicate with the | ||||
outside world and its resource consumption is limited. This can be used | ||||
for example in a programming contest to run untrusted programs submitted | ||||
by contestants in a controlled environment. | ||||
The sandbox is used in the following way: | ||||
* Run *isolate --init*, which initializes the sandbox, creates its working directory and | ||||
prints its name to the standard output. Fails if the sandbox already existed. | ||||
* Populate the directory with the executable file of the program and its | ||||
input files. | ||||
* Call *isolate --run* to run the program. A single line describing the | ||||
status of the program is written to the standard error stream. | ||||
* Fetch the output of the program from the directory. | ||||
* Run *isolate --cleanup* to remove temporary files. Does nothing if the sandbox | ||||
was already cleaned up. | ||||
Please note that by default, the program is not allowed to start multiple | ||||
processes of threads. If you need that, turn on the control group mode | ||||
(see below). | ||||
OPTIONS | ||||
------- | ||||
*-M, --meta=*'file':: | ||||
Output meta-data on the execution of the program to a given file. | ||||
See below for syntax of the meta-files. | ||||
*-m, --mem=*'size':: | ||||
Limit address space of the program to 'size' kilobytes. If more processes | ||||
are allowed, this applies to each of them separately. | ||||
*-t, --time=*'time':: | ||||
Limit run time of the program to 'time' seconds. Fractional numbers are allowed. | ||||
Time in which the OS assigns the processor to different tasks is not counted. | ||||
*-w, --wall-time=*'time':: | ||||
Limit wall-clock time to 'time' seconds. Fractional values are allowed. | ||||
This clock measures the time from the start of the program to its exit, | ||||
so it does not stop when the program has lost the CPU or when it is waiting | ||||
for an external event. We recommend to use *--time* as the main limit, | ||||
but set *--wall-time* to a much higher value as a precaution against | ||||
sleeping programs. | ||||
*-x, --extra-time=*'time':: | ||||
When a time limit is exceeded, wait for extra 'time' seconds before | ||||
killing the program. This has the advantage that the real execution time | ||||
is reported, even though it slightly exceeds the limit. Fractional | ||||
numbers are again allowed. | ||||
*-b, --box-id=*'id':: | ||||
When you run multiple sandboxes in parallel, you have to assign each unique | ||||
IDs to them by this option. See the discussion on UIDs in the INSTALLATION | ||||
section. The ID defaults to 0. | ||||
*-k, --stack=*'size':: | ||||
Limit process stack to 'size' kilobytes. By default, the whole address | ||||
space is available for the stack, but it is subject to the *--mem* limit. | ||||
*-f, --fsize=*'size':: | ||||
Limit size of files created (or modified) by the program to 'size' kilobytes. | ||||
In most cases, it is better to restrict overall disk usage by a disk quota | ||||
(see below). This option can help in cases when quotas are not enabled | ||||
on the underlying filesystem. | ||||
*-q, --quota=*'blocks'*,*'inodes':: | ||||
Set disk quota to a given number of blocks and inodes. This requires the | ||||
filesystem to be mounted with support for quotas. Please note that this | ||||
currently works only on the ext family of filesystems (other filesystems | ||||
use other interfaces for setting quotas). | ||||
*-i, --stdin=*'file':: | ||||
Redirect standard input from 'file'. The 'file' has to be accessible | ||||
inside the sandbox. Otherwise, standard input is inherited from the | ||||
parent process. | ||||
*-o, --stdout=*'file':: | ||||
Redirect standard output to 'file'. The 'file' has to be accessible | ||||
inside the sandbox. Otherwise, standard output is inherited from the | ||||
parent process and the sandbox manager does not write anything to it. | ||||
*-r, --stderr=*'file':: | ||||
Redirect standard error output to 'file'. The 'file' has to be accessible | ||||
inside the sandbox. Otherwise, standard error output is inherited from the | ||||
parent process. See also *--stderr-to-stdout*. | ||||
*--stderr-to-stdout*:: | ||||
Redirect standard error output to standard output. This is performed after | ||||
the standard output is redirected by *--stdout*. Mutually exclusive with *--stderr*. | ||||
*-c, --chdir=*'dir':: | ||||
Change directory to 'dir' before executing the program. This path must be | ||||
relative to the root of the sandbox. | ||||
*-p, --processes*[*=*'max']:: | ||||
Permit the program to create up to 'max' processes and/or threads. Please | ||||
keep in mind that time and memory limit do not work with multiple processes | ||||
unless you enable the control group mode. If 'max' is not given, an arbitrary | ||||
number of processes can be run. By default, only one process is permitted. | ||||
*--share-net*:: | ||||
By default, isolate creates a new network namespace for its child process. | ||||
This namespace contains no network devices except for a per-namespace loopback. | ||||
This prevents the program from communicating with the outside world. If you want | ||||
to permit communication, you can use this switch to keep the child process | ||||
in parent's network namespace. | ||||
*--inherit-fds*:: | ||||
By default, isolate closes all file descriptors passed from its parent | ||||
except for descriptors 0, 1, and 2. | ||||
This prevents unintentional descriptor leaks. In some cases, passing extra | ||||
descriptors to the sandbox can be desirable, so you can use this switch | ||||
to make them survive. | ||||
*-v, --verbose*:: | ||||
Tell the sandbox manager to be verbose and report on what is going on. | ||||
Using *-v* multiple times produces even more jabber. | ||||
*-s, --silent*:: | ||||
Tell the sandbox manager to keep silence. No status messages are printed | ||||
to stderr except for fatal errors of the sandbox itself. The combination of | ||||
*--verbose* and *--silent* has an undefined effect. | ||||
ENVIRONMENT RULES | ||||
----------------- | ||||
UNIX processes normally inherit all environment variables from their parent. The | ||||
sandbox however passes only those variables which are explicitly requested by | ||||
environment rules: | ||||
*-E, --env=*'var':: | ||||
Inherit the variable 'var' from the parent. | ||||
*-E, --env=*'var'*=*'value':: | ||||
Set the variable 'var' to 'value'. When the 'value' is empty, the | ||||
variable is removed from the environment. | ||||
*-e, --full-env*:: | ||||
Inherit all variables from the parent. | ||||
The rules are applied in the order in which they were given, except for | ||||
*--full-env*, which is applied first. | ||||
The list of rules is automatically initialized with *-ELIBC_FATAL_STDERR_=1*. | ||||
DIRECTORY RULES | ||||
--------------- | ||||
The sandboxed process gets its own filesystem namespace, which contains only subtrees | ||||
requested by directory rules: | ||||
*-d, --dir=*'in'*=*'out'[*:*'options']:: | ||||
Bind the directory 'out' as seen by the caller to the path 'in' inside the sandbox. | ||||
If there already was a directory rule for 'in', it is replaced. | ||||
*-d, --dir=*'dir'[*:*'options']:: | ||||
Bind the directory +/+'dir' to 'dir' inside the sandbox. | ||||
If there already was a directory rule for 'in', it is replaced. | ||||
*-d, --dir=*'in'*=*:: | ||||
Remove a directory rule for the path 'in' inside the sandbox. | ||||
By default, all directories are bound read-only and restricted (no devices, | ||||
no setuid binaries). This behavior can be modified using the 'options': | ||||
*rw*:: | ||||
Allow read-write access. | ||||
*dev*:: | ||||
Allow access to character and block devices. | ||||
*noexec*:: | ||||
Disallow execution of binaries. | ||||
*maybe*:: | ||||
Silently ignore the rule if the directory to be bound does not exist. | ||||
*fs*:: | ||||
Instead of binding a directory, mount a device-less filesystem called 'in'. | ||||
For example, this can be 'proc' or 'sysfs'. | ||||
Unless *--no-default-dirs* is specified, the default set of directory rules binds +/bin+, | ||||
+/dev+ (with devices allowed), +/lib+, +/lib64+ (if it exists), and +/usr+. It also binds | ||||
the working directory to +/box+ (read-write) and mounts the proc filesystem at +/proc+. | ||||
*-D, --no-default-dirs*:: | ||||
Do not bind the default set of directories. Care has to be taken to specify | ||||
the correct set of rules (using *--dir*) for the executed program to run | ||||
correctly. In particular, +/box+ has to be bound. | ||||
CONTROL GROUPS | ||||
-------------- | ||||
Isolate can make use of system control groups provided by the kernel | ||||
to constrain programs consisting of multiple processes. Please note | ||||
that this feature needs special system setup described in the INSTALLATION | ||||
section. | ||||
*--cg*:: | ||||
Enable use of control groups. This should be specified with *--init*, | ||||
*--run* and *--cleanup*. | ||||
*--cg-mem=*'size':: | ||||
Limit total memory usage by the whole control group to 'size' kilobytes. | ||||
This should be specified with *--run*. | ||||
*--cg-timing*:: | ||||
Use control groups for timing, so that the *--time* switch affects the | ||||
total run time of all processes and threads in the control group. | ||||
This should be specified with *--run*. | ||||
This option is turned on by default, use *--no-cg-timing* to turn off. | ||||
META-FILES | ||||
---------- | ||||
The meta-file contains miscellaneous meta-information on execution of the | ||||
program within the sandbox. It is a textual file consisting of lines | ||||
of format 'key'*:*'value'. The following keys are defined: | ||||
*cg-mem*:: | ||||
When control groups are enabled, this is the total memory use | ||||
by the whole control group (in kilobytes). | ||||
*cg-oom-killed*:: | ||||
Present when the program was killed by the out-of-memory killer | ||||
(e.g., because it has exceeded the memory limit of its control group). | ||||
This is reported only on Linux 4.13 and later. | ||||
*csw-forced*:: | ||||
Number of context switches forced by the kernel. | ||||
*csw-voluntary*:: | ||||
Number of context switches caused by the process giving up the CPU | ||||
voluntarily. | ||||
*exitcode*:: | ||||
The program has exited normally with this exit code. | ||||
*exitsig*:: | ||||
The program has exited after receiving this fatal signal. | ||||
*killed*:: | ||||
Present when the program was terminated by the sandbox | ||||
(e.g., because it has exceeded the time limit). | ||||
*max-rss*:: | ||||
Maximum resident set size of the process (in kilobytes). | ||||
*message*:: | ||||
Status message, not intended for machine processing. | ||||
E.g., "Time limit exceeded." | ||||
*status*:: | ||||
Two-letter status code: | ||||
* *RE* -- run-time error, i.e., exited with a non-zero exit code | ||||
* *SG* -- program died on a signal | ||||
* *TO* -- timed out | ||||
* *XX* -- internal error of the sandbox | ||||
*time*:: | ||||
Run time of the program in fractional seconds. | ||||
*time-wall*:: | ||||
Wall clock time of the program in fractional seconds. | ||||
Please note that not all keys have to be present. | ||||
For example, no *status* nor *message* is reported upon normal termination. | ||||
RETURN VALUE | ||||
------------ | ||||
When the program inside the sandbox finishes correctly, the sandbox returns 0. | ||||
If it finishes incorrectly, it returns 1. | ||||
All other return codes signal an internal error. | ||||
INSTALLATION | ||||
------------ | ||||
Isolate depends on several advanced features of the Linux kernel. Please | ||||
make sure that your kernel supports | ||||
PID namespaces (+CONFIG_PID_NS+), | ||||
IPC namespaces (+CONFIG_IPC_NS+), and | ||||
network namespaces (+CONFIG_NET_NS+). | ||||
If you want to use control groups, you need | ||||
the cpusets (+CONFIG_CPUSETS+), | ||||
CPU accounting controller (+CONFIG_CGROUP_CPUACCT+), and | ||||
memory resource controller (+CONFIG_MEMCG+). If your machine has swap enabled, | ||||
you should also enable the swap controller (+CONFIG_MEMCG_SWAP+). | ||||
Debian 7.x and newer require enabling the memory and swap cgroup controllers by | ||||
adding the parameters "cgroup_enable=memory swapaccount=1" to the kernel | ||||
command-line, which can be set using +GRUB_CMDLINE_LINUX_DEFAULT+ in | ||||
/etc/default/grub. | ||||
Isolate is designed to run setuid to root. The sub-process inside the sandbox | ||||
then switches to a non-privileged user ID (different for each *--box-id*). | ||||
The range of UIDs available and several filesystem paths are set in a configuration | ||||
file, by default located in /usr/local/etc/isolate. | ||||
Before you run isolate with control groups, you need to ensure that the cgroup | ||||
filesystem is enabled and mounted. Most modern Linux distributions already | ||||
provide cgroup support through a tmpfs mounted at /sys/fs/cgroup, with | ||||
individual controllers mounted within subdirectories. | ||||
REPRODUCIBILITY | ||||
--------------- | ||||
The reproducibility of results can be improved by tuning some kernel | ||||
parameters, listed below. Some of these parameters can be checked using the | ||||
program isolate-check-environment. | ||||
* Disable address space randomization: +sysctl kernel.randomize_va_space=0+. | ||||
Address space randomization can affect timing, memory usage, and program | ||||
behavior. This setting can be made persistent through /etc/sysctl.d/. | ||||
* Disable dynamic CPU frequency scaling. This requires setting the cpufreq | ||||
scaling governor to +performance+. The process for doing this varies between | ||||
distributions. | ||||
* Consider disabling Turboboost on CPUs that might support it (most i3/i5/i7 | ||||
Intel CPUs). Approach this one with caution. Disabling a CPU that Turboboosts | ||||
from 2.3 GHz to 2.6 GHz would have minimal impact on run-times in exchange | ||||
for determinism, but the same on a CPU that Turboboosts from 1.6 GHz to 2.8 | ||||
GHz will incur a much more dramatic slowdown. Perhaps if the ambient | ||||
temperature is controlled and only one single-threaded task is keeping the | ||||
CPU busy at 100%, then TB's behaviour may be reasonably deterministic; | ||||
requires further experimentation to confirm. | ||||
* Run evaluations on a single CPU (core). The Linux scheduler has a tendency to randomly | ||||
migrate tasks between CPUs, incurring cache migration costs. You can use isolate's | ||||
configuration file to pin the process to a specified CPU. | ||||
* Disable automatic kernel support for transparent huge pages. Both /sys/kernel/mm/transparent_hugepage/enabled | ||||
and /sys/kernel/mm/transparent_hugepage/defrag should be set to "madvise" or "never", and | ||||
/sys/kernel/mm/transparent_hugepage/khugepaged/defrag to 0. | ||||
* Disable swapping. If you really need swap space and you are using cgroups, | ||||
make sure that you have the memsw controller enabled, so that swap space is | ||||
properly accounted for. | ||||
LICENSE | ||||
------- | ||||
Isolate was written by Martin Mares and Bernard Blackham. | ||||
It can be distributed and used under the terms of the GNU | ||||
General Public License version 2 or any later version. | ||||