Home PC Games Linux Windows Database Network Programming Server Mobile  
           
  Home \ Linux \ namespace mechanism Linux kernel analysis     - High-performance Linux system firewall detailed analysis of double-effect (Linux)

- Spring Integration ehcache annotation implement the query cache and cache update or delete instant (Programming)

- Use LVM partition on Ubuntu and easily adjust the snapshot (Linux)

- to compile FFmpeg In Ubuntu (Linux)

- CentOS7 management of systemd (Linux)

- Sublime Text 3 using summary (Linux)

- Linux system security mechanisms to share (Linux)

- Install and manage Java under mac (Linux)

- Thinking in Java study notes - everything is an object (Programming)

- Upgrade installation manual CentOS6.5 GCC4.8.2 (Linux)

- Schema snapshot rollback (Database)

- Computer security perimeter recommendations (Linux)

- Impact test noatime Linux file access time (Linux)

- Docker container plaintext password problem-solving way (Server)

- Elaborate 10-point difference between the new and malloc (Programming)

- Oracle PL / SQL selective basis (IF CASE), (LOOP WHILE FOR) (Database)

- 3 tips Linux command (Linux)

- Android Send HTTP POST requests (Programming)

- Achieve single backup of MySQL by innobackupex (Database)

- Ubuntu users to install voice switch instructs the applet (Linux)

 
         
  namespace mechanism Linux kernel analysis
     
  Add Date : 2017-08-31      
         
         
         
  1. Linux kernel namespace mechanism

Linux Namespaces provide a mechanism for resource isolation scheme. PID, IPC, Network and other system resources, no longer a global nature, but belong to a specific Namespace. Resources for each namespace for resources under other namespace under transparent, invisible. Therefore, the operating system level, the more of the same pid process occurs. The system can exist simultaneously process two process ID as 0,1,2, because they belong to a different namespace, so it was not a conflict between them. In the user level you can only see the resources that belong to the user's own namespace under, for example, use the ps command can list the processes under its own namespace. So that each namespace looks like a single Linux system.

2. Linux kernel namespace structure

It provides multiple namespace in the Linux kernel, including the fs (mount), uts, network, sysvipc, and so on. A process can belong to multiple namesapce, since namespace and process-related, it will include and namespace associated variables task_struct structure body. There is a pointer to nsproxy namespace structure in task_struct structure.

struct task_struct {

...... ..

/ * Namespaces * /

        struct nsproxy * nsproxy;

.......

}

Look at how nsproxy is defined in include / linux / nsproxy.h file, where a total of five each defined namespace structure, defined five points to each type namespace pointer to the structure, since multiple processes can use the same namespace, so you can share nsproxy use, count field is a reference count of the structure.

/ * 'Count' is the number of tasks holding a reference.

 * The count for each namespace, then, will be the number

 * Of nsproxies pointing to it, not the number of tasks.

 * The nsproxy is shared by tasks which share all namespaces.

 * As soon as a single namespace is cloned or unshared, the

 * Nsproxy is copied

* /

struct nsproxy {

        atomic_t count;

        struct uts_namespace * uts_ns;

        struct ipc_namespace * ipc_ns;

        struct mnt_namespace * mnt_ns;

        struct pid_namespace * pid_ns_for_children;

        struct net * net_ns;

};

(1) UTS namespace contains the name of the running kernel, version, underlying architecture type and other information. UTS is UNIX Timesharing System for short.

(2) stored in the struct ipc_namespace information relating to all inter-process communication (IPC).

(3) view of the already loaded file system, are given in the struct mnt_namespace.

Information (4) about the process ID is provided by the struct pid_namespace.

(5) struct net_ns contains all network-related parameters namespace.

The system has a default nsproxy, init_nsproxy, this structure is initialized in the task will be initialized. #define INIT_TASK (tsk) \

{

        .nsproxy = & init_nsproxy,

}

Init_nsproxy which is defined as:

static struct kmem_cache * nsproxy_cachep;

 

struct nsproxy init_nsproxy = {

        .count = ATOMIC_INIT (1),

        .uts_ns = & init_uts_ns,

#if defined (CONFIG_POSIX_MQUEUE) || defined (CONFIG_SYSVIPC)

        .ipc_ns = & init_ipc_ns,

#endif

        .mnt_ns = NULL,

        .pid_ns_for_children = & init_pid_ns,

#ifdef CONFIG_NET

        .net_ns = & init_net,

#endif

};

For .mnt_ns not initialized, the rest of the namespace have carried out the initial default.

3. Use the clone to create your own Namespace

If you want to create your own namespace, you can use the system call clone (), it is the prototype for the user space

int clone (int (* fn) (void *), void * child_stack, int flags, void * arg)

Here fn is a function pointer, this is a pointer to a function ,, child_stack for the child process distribution system stack space, flags flag is used to describe the resources that you need to inherit from the parent process, arg parameter is passed to the child process is fn pointing function parameters. Here is the value of flags can be taken. Here only care and namespace-related parameters.

CLONE_FS child process and the parent process share the same file system, including root, the current directory, umask

CLONE_NEWNS When clone set this flag needs its own namespace, you can not set CLONE_NEWS and CLONE_FS simultaneously.

Clone () function is a wrapper function defined in the libc library, which is responsible for the establishment of a new lightweight process stack and call for programmers hidden clone system with a bar. Implement clone () sys_clone system call () service routine and does not fn arg parameter. Package the fn function pointer stored in each position of the sub-process stack, which is the location of the package the return address of the function itself is located. Arg pointer just stored in the child's stack below fn. When the packaging function, CPU remove the return address from the stack and execute the fn (arg) function.

/ * Prototype for the glibc wrapper function * /

      #include < sched.h>

      int clone (int (* fn) (void *), void * child_stack,

                int flags, void * arg, ...

                / * Pid_t * ptid, struct user_desc * tls, pid_t * ctid * /);

      / * Prototype for the raw system call * /

      long clone (unsigned long flags, void * child_stack,

                void * ptid, void * ctid,

                struct pt_regs * regs);

        We see in the Linux kernel implemented function, it is the result package libc library had, in the Linux kernel fork.c file, have the following definitions, the final call is do_fork () function.

#ifdef __ARCH_WANT_SYS_CLONE

#ifdef CONFIG_CLONE_BACKWARDS

SYSCALL_DEFINE5 (clone, unsigned long, clone_flags, unsigned long, newsp,

                    int __user *, parent_tidptr,

                    int, tls_val,

                    int __user *, child_tidptr)

#elif defined (CONFIG_CLONE_BACKWARDS2)

SYSCALL_DEFINE5 (clone, unsigned long, newsp, unsigned long, clone_flags,

                    int __user *, parent_tidptr,

                    int __user *, child_tidptr,

                    int, tls_val)

#elif defined (CONFIG_CLONE_BACKWARDS3)

SYSCALL_DEFINE6 (clone, unsigned long, clone_flags, unsigned long, newsp,

                  int, stack_size,

                  int __user *, parent_tidptr,

                  int __user *, child_tidptr,

                  int, tls_val)

#else

SYSCALL_DEFINE5 (clone, unsigned long, clone_flags, unsigned long, newsp,

                    int __user *, parent_tidptr,

                    int __user *, child_tidptr,

                    int, tls_val)

#endif

{

        return do_fork (clone_flags, newsp, 0, parent_tidptr, child_tidptr);

}

#endif

3.1 do_fork function

        In the clone () function call do_fork function the real deal, calling copy_process process do_fork processing function.

long do_fork (unsigned long clone_flags,

              unsigned long stack_start,

              unsigned long stack_size,

              int __user * parent_tidptr,

              int __user * child_tidptr)

{

        struct task_struct * p;

        int trace = 0;

        long nr;

        / *

          * Determine whether and which event to report to ptracer. When

          * Called from kernel_thread or CLONE_UNTRACED is explicitly

          * Requested, no event is reported; otherwise, report if the event

          * For the type of forking is enabled.

          * /

        if (! (clone_flags & CLONE_UNTRACED)) {

                  if (clone_flags & CLONE_VFORK)

                            trace = PTRACE_EVENT_VFORK;

                  else if ((clone_flags & CSIGNAL)! = SIGCHLD)

                            trace = PTRACE_EVENT_CLONE;

                  else

                            trace = PTRACE_EVENT_FORK;

 

                  if (likely (! ptrace_event_enabled (current, trace)))

                            trace = 0;

        }

        p = copy_process (clone_flags, stack_start, stack_size,

                            child_tidptr, NULL, trace);

        / *

          * Do this prior waking up the new thread - the thread pointer

          * Might get invalid after that point, if the thread exits quickly.

          * /

        if (! IS_ERR (p)) {

                  struct completion vfork;

                  struct pid * pid;

 
                  trace_sched_process_fork (current, p);

 

                  pid = get_task_pid (p, PIDTYPE_PID);

                  nr = pid_vnr (pid);

 

                  if (clone_flags & CLONE_PARENT_SETTID)

                            put_user (nr, parent_tidptr);

 

                  if (clone_flags & CLONE_VFORK) {

                            p-> vfork_done = & vfork;

                            init_completion (& vfork);

                            get_task_struct (p);

                  }

 

                  wake_up_new_task (p);

 

                  / * Forking complete and child started to run, tell ptracer * /

                  if (unlikely (trace))

                            ptrace_event_pid (trace, pid);


                  if (clone_flags & CLONE_VFORK) {

                            if (! wait_for_vfork_done (p, & vfork))

                                    ptrace_event_pid (PTRACE_EVENT_VFORK_DONE, pid);

                  }

 
                  put_pid (pid);

        } Else {

                  nr = PTR_ERR (p);

        }

        return nr;

}

3.2 copy_process function

Call copy_namespaces function copy_process function.

static struct task_struct * copy_process (unsigned long clone_flags,

                                              unsigned long stack_start,

                                              unsigned long stack_size,

                                              int __user * child_tidptr,

                                              struct pid * pid,

                                              int trace)

{

          int retval;

          struct task_struct * p;

/ * The following code is clone_flag sign checks, some expressed are mutually exclusive, for example CLONE_NEWNS and CLONENEW_FS * /

          if ((clone_flags & (CLONE_NEWNS | CLONE_FS)) == (CLONE_NEWNS | CLONE_FS))

                  return ERR_PTR (-EINVAL);

 

          if ((clone_flags & (CLONE_NEWUSER | CLONE_FS)) == (CLONE_NEWUSER | CLONE_FS))

                  return ERR_PTR (-EINVAL);

 

          if ((clone_flags & CLONE_THREAD) &&! (clone_flags & CLONE_SIGHAND))

                  return ERR_PTR (-EINVAL);

 

          if ((clone_flags & CLONE_SIGHAND) &&! (clone_flags & CLONE_VM))

                  return ERR_PTR (-EINVAL);

 

          if ((clone_flags & CLONE_PARENT) &&

                                      current-> signal-> flags & SIGNAL_UNKILLABLE)

                  return ERR_PTR (-EINVAL);

 

......

retval = copy_namespaces (clone_flags, p);

          if (retval)

                  goto bad_fork_cleanup_mm;

          retval = copy_io (clone_flags, p);

          if (retval)

                  goto bad_fork_cleanup_namespaces;

          retval = copy_thread (clone_flags, stack_start, stack_size, p);

          if (retval)

                  goto bad_fork_cleanup_io;

/ * Do_fork call copy_process function, which the pid argument is NULL, so if this judgment is established. A process where the namespace allocated pid, there is a critical function before the 3.0 kernel, and cgroup relationship is created after the namespace,

if (current-> nsproxy! = p-> nsproxy) {

retval = ns_cgroup_clone (p, pid);

if (retval)

goto bad_fork_free_pid;

But after the 3.0 kernel to delete details, please refer remove the ns_cgroup * /

          if (pid! = & init_struct_pid) {

                  retval = -ENOMEM;

                  pid = alloc_pid (p-> nsproxy-> pid_ns_for_children);

                  if (! pid)

                            goto bad_fork_cleanup_io;

          } ... ..

}

3.3 copy_namespaces function

        Copy_namespaces defined function in kernel / nsproxy.c file.

int copy_namespaces (unsigned long flags, struct task_struct * tsk)

{

        struct nsproxy * old_ns = tsk-> nsproxy;

        struct user_namespace * user_ns = task_cred_xxx (tsk, user_ns);

        struct nsproxy * new_ns;

 / * First check flag, flag if the flag is not one of the five following, it will call for old_ns get_nsproxy decrements the reference count, then direct return 0 * /

        if (likely ((flags & (CLONE_NEWNS |! CLONE_NEWUTS | CLONE_NEWIPC |

                                  CLONE_NEWPID | CLONE_NEWNET)))) {

                  get_nsproxy (old_ns);

                  return 0;

        }

  / * If the current process has superuser privileges * /

        if (! ns_capable (user_ns, CAP_SYS_ADMIN))

                  return -EPERM;

        / *

          * CLONE_NEWIPC must detach from the undolist: after switching

          * To a new ipc namespace, the semaphore arrays from the old

          * Namespace are unreachable. In clone parlance, CLONE_SYSVSEM

          * Means share undolist with parent, so we must forbid using

          * It along with CLONE_NEWIPC.

          For CLONE_NEWIPC special judge, * /

        if ((flags & (CLONE_NEWIPC | CLONE_SYSVSEM)) ==

                  (CLONE_NEWIPC | CLONE_SYSVSEM))

                  return -EINVAL;

 / * Create a new process for the namespace * /

        new_ns = create_new_namespaces (flags, tsk, user_ns, tsk-> fs);

        if (IS_ERR (new_ns))

                  return PTR_ERR (new_ns);

        tsk-> nsproxy = new_ns;

        return 0;

}

3.4 create_new_namespaces function

create_new_namespaces create a new namespace

static struct nsproxy * create_new_namespaces (unsigned long flags,

        struct task_struct * tsk, struct user_namespace * user_ns,

        struct fs_struct * new_fs)

{

        struct nsproxy * new_nsp;

        int err;

    / * Allocate memory for the new nsproxy space, and its reference count is set to an initial 1 * /

        new_nsp = create_nsproxy ();

        if (! new_nsp)

                  return ERR_PTR (-ENOMEM);

  / * If the Namespace individual flags are set, then call the appropriate namespace created * /

        new_nsp-> mnt_ns = copy_mnt_ns (flags, tsk-> nsproxy-> mnt_ns, user_ns, new_fs);

        if (IS_ERR (new_nsp-> mnt_ns)) {

                  err = PTR_ERR (new_nsp-> mnt_ns);

                  goto out_ns;

        }

        new_nsp-> uts_ns = copy_utsname (flags, user_ns, tsk-> nsproxy-> uts_ns);

        if (IS_ERR (new_nsp-> uts_ns)) {

                  err = PTR_ERR (new_nsp-> uts_ns);

                  goto out_uts;

        }

 
        new_nsp-> ipc_ns = copy_ipcs (flags, user_ns, tsk-> nsproxy-> ipc_ns);

        if (IS_ERR (new_nsp-> ipc_ns)) {

                  err = PTR_ERR (new_nsp-> ipc_ns);

                  goto out_ipc;

        }

 
        new_nsp-> pid_ns_for_children =

                  copy_pid_ns (flags, user_ns, tsk-> nsproxy-> pid_ns_for_children);

        if (IS_ERR (new_nsp-> pid_ns_for_children)) {

                  err = PTR_ERR (new_nsp-> pid_ns_for_children);

                  goto out_pid;

        }

        new_nsp-> net_ns = copy_net_ns (flags, user_ns, tsk-> nsproxy-> net_ns);

        if (IS_ERR (new_nsp-> net_ns)) {

                  err = PTR_ERR (new_nsp-> net_ns);

                  goto out_net;

        }

 
        return new_nsp;

out_net:

        if (new_nsp-> pid_ns_for_children)

                  put_pid_ns (new_nsp-> pid_ns_for_children);

out_pid:

        if (new_nsp-> ipc_ns)

                  put_ipc_ns (new_nsp-> ipc_ns);

out_ipc:

        if (new_nsp-> uts_ns)

                  put_uts_ns (new_nsp-> uts_ns);

out_uts:

        if (new_nsp-> mnt_ns)

                  put_mnt_ns (new_nsp-> mnt_ns);

out_ns:

        kmem_cache_free (nsproxy_cachep, new_nsp);

        return ERR_PTR (err);

}

3.4.1 create_nsproxy function

static inline struct nsproxy * create_nsproxy (void)

{

        struct nsproxy * nsproxy;

        nsproxy = kmem_cache_alloc (nsproxy_cachep, GFP_KERNEL);

        if (nsproxy)

                  atomic_set (& nsproxy-> count, 1);

        return nsproxy;

}

Examples 1: namespace pid examples

#include < errno.h>

#include < stdio.h>

#include < stdlib.h>

#include < sys / types.h>

#include < unistd.h>

#include < sched.h>

#include < string.h>

static int fork_child (void * arg)

{

        int a = (int) arg;

        int i;

        pid_t pid;

        char * cmd = "ps -el;

        printf ( "In the container, my pid is:% d \ n", getpid ());

 / * Ps command is parsed content procfs to get results, and the process pid directory procfs mount the root directory is based on the time pid namespace, and this is reflected in the procfs get_sb callback. So only need to re-mount look proc, mount -t proc proc / proc * /

        mount ( "proc", "/ proc", "proc", 0, "");

        for (i = 0; i
                  pid = fork ();

                  if (pid <0)

                            return pid;

                  else if (pid)

                            printf ( "pid of my child is% d \ n", pid);

                  else if (pid == 0) {

                            sleep (30);

                            exit (0);

                  }

        }

        execl ( "/ bin / bash", "/ bin / bash", "- c", cmd, NULL);

        return 0;

}

int main (int argc, char * argv [])

{

        int cpid;

        void * childstack, * stack;

        int flags;

        int ret = 0;

        int stacksize = getpagesize () * 4;

        if (argc! = 2) {

                  fprintf (stderr, "Wrong usage \ n.");

                  return -1;

        }

        stack = malloc (stacksize);

        if (stack == NULL)

        {

                  return -1;

        }

        printf ( "Out of the container, my pid is:% d \ n", getpid ());

        childstack = stack + stacksize;

        flags = CLONE_NEWPID | CLONE_NEWNS;

        cpid = clone (fork_child, childstack, flags, (void *) atoi (argv [1]));

        printf ( "cpid:% d \ n", cpid);

        if (cpid <0) {

                  perror ( "clone");

                  ret = -1;

                  goto out;

        }

        fprintf (stderr, "Parent sleeping 20 seconds \ n");

        sleep (20);

        ret = 0;

out:

        free (stack);

        return ret;

}

} The result:

root @ Ubuntu: ~ / c_program # ./namespace 7

Out of the container, my pid is: 8684

cpid: 8685

Parent sleeping 20 seconds

In the container, my pid is: 1

pid of my child is 2

pid of my child is 3

pid of my child is 4

pid of my child is 5

pid of my child is 6

pid of my child is 7

pid of my child is 8

F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD

4 R 0 1 0 0 80 0 - 1085 - pts / 0 00:00:00 ps

1 S 0 2 1 0 80 0 - 458 hrtime pts / 0 00:00:00 namespace

1 S 0 3 1 0 80 0 - 458 hrtime pts / 0 00:00:00 namespace

1 S 0 4 1 0 80 0 - 458 hrtime pts / 0 00:00:00 namespace

1 S 0 5 1 0 80 0 - 458 hrtime pts / 0 00:00:00 namespace

1 S 0 6 1 0 80 0 - 458 hrtime pts / 0 00:00:00 namespace

1 S 0 7 1 0 80 0 - 458 hrtime pts / 0 00:00:00 namespace

1 S 0 8 1 0 80 0 - 458 hrtime pts / 0 00:00:00 namespace

Examples 2: UTS examples

#define _GNU_SOURCE

#include < sys / wait.h>

#include < sys / utsname.h>

#include < sched.h>

#include < string.h>

#include < stdio.h>

#include < stdlib.h>

#include < unistd.h>

#define errExit (msg) do {perror (msg); exit (EXIT_FAILURE); \

} While (0)

        static int / * Start function for cloned child * /

childFunc (void * arg)

{

        struct utsname uts;

        / * Change hostname in UTS namespace of child * /

        if (sethostname (arg, strlen (arg)) == -1)

                  errExit ( "sethostname");

        / * Retrieve and display hostname * /

        if (uname (& uts) == -1)

                  errExit ( "uname");

        printf ( "uts.nodename in child:% s \ n", uts.nodename);

        / * Keep the namespace open for a while, by sleeping.

          * This allows some experimentation - for example, another

          * Process might join the namespace. * /

        sleep (200);

        return 0; / * Child terminates now * /

}

#define STACK_SIZE (1024 * 1024) / * Stack size for cloned child * /

        int

main (int argc, char * argv [])

{

        char * stack; / * Start of stack buffer * /

        char * stackTop; / * End of stack buffer * /

        pid_t pid;

        struct utsname uts;

        if (argc <2) {

                  fprintf (stderr, "Usage:% s \ n", argv [0]);

                  exit (EXIT_SUCCESS);

        }

        / * Allocate stack for child * /

        stack = malloc (STACK_SIZE);

        if (stack == NULL)

                  errExit ( "malloc");

        stackTop = stack + STACK_SIZE; / * Assume stack grows downward * /

        / * Create child that has its own UTS namespace;

          * Child commences execution in childFunc () * /

        pid = clone (childFunc, stackTop, CLONE_NEWUTS | SIGCHLD, argv [1]);

        if (pid == -1)

                  errExit ( "clone");

        printf ( "clone () returned% ld \ n", (long) pid);

        / * Parent falls through to here * /

        sleep (1); / * Give child time to change its hostname * /

        / * Display hostname in parent's UTS namespace. This will be

          * Different from hostname in child's UTS namespace. * /

        if (uname (& uts) == -1)

                  errExit ( "uname");

        printf ( "uts.nodename in parent:% s \ n", uts.nodename);

        if (waitpid (pid, NULL, 0) == -1) / * Wait for child * /

                  errExit ( "waitpid");

        printf ( "child has terminated \ n");

        exit (EXIT_SUCCESS);

}

root @ ubuntu: ~ / c_program # ./namespace_1 test

clone () returned 4101

uts.nodename in child: test

uts.nodename in parent: ubuntu
     
         
         
         
  More:      
 
- RPM package management tools under Linux (Linux)
- Share useful bash aliases and functions (Linux)
- To learn linux security (Linux)
- Linux file permissions and access modes (Linux)
- RedHat Linux 9.0 under P4VP-MX motherboard graphics resolution of problems (Linux)
- Neo4j map data processing tab (Database)
- The security administrator network analysis tools SATAN Introduction under Linux (Linux)
- Everyone should know something about TCP (Linux)
- Increase Linux system security --chattr (Linux)
- Linux configuration Samba server (Server)
- 64 Ubuntu 15.04 Linux kernel upgrade to Linux 4.1.0 (Linux)
- Depth understanding of C language (Programming)
- Intel Graphics Installer 1.0.3 released, support for Ubuntu (Linux)
- Achieve single backup of MySQL by innobackupex (Database)
- C ++ inline functions (Programming)
- Python is not C (Programming)
- Installation and deployment of MariaDB under CentOS (Database)
- PostgreSQL export table structure information (Database)
- Household use Linux Security (Linux)
- Configuring Haproxy log support (syslog logging support) (Server)
     
           
     
  CopyRight 2002-2022 newfreesoft.com, All Rights Reserved.