Re: [PATCH] [RFC] syscalls,x86: Add execveat() system call (v2)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On Wed, Aug 01, 2012 at 04:30:22PM -0700, H. Peter Anvin wrote:
> On 08/01/2012 04:09 PM, Meredydd Luff wrote:
> >>>  #
> >>>  # x32-specific system call numbers start at 512 to avoid cache impact
> >>
> >> I think that should be common, not 64 (as should kcmp be).
> > 
> > I copied the original execve, which is 64.
> > 
> 
> Sorry, you're right.  The argument vector needs compatibility support.
> 
> This means you need an x32 version of the function -- execve
> unfortunately is one of the few system calls which require a special x32
> version (although it's a simple wrapper around sys32_execve).  See
> sys_x32_execve.

I *really* strongly object to doing that thing before we sanitize the
situation with sys_execve().  As it is, the damn thing is defined
separately on each architecture, with spectaculary ugly kludges used
in these implementations.  Adding a parallel pile of kludges (and
due to their nature, they'll need to be changed in non-trivial
way in a lot of cases) is simply wrong.

The thing is, there's essentially no reason to have more than one
implementation.  What they are (badly) doing is "we need to find
pt_regs to pass to do_execve(), the thing we are after has to be near
our stack frame, so let's try to get to it that way".  With really
ugly set of kludges trying to do just that.

What we should use instead is task_pt_regs(); maybe introduce
current_pt_regs(), defaulting to task_pt_regs(current) and letting
architectures that can do it better (on some it's simply
available in dedicated register, on some it's better to work
from current_thread_info(), etc.) override the default.
With that we have a fairly good chance to merge most of those
guys; probably not all of them, due to e.g. mips weirdness,
but enough to make it worth doing.

The obstacle is in lazy kernel_execve() implementations;
ones that simply issue a trap/whatever is used to enter
the system call.  Directly from kernel space.  It doesn't
have to be done that way; see what e.g. arm does there.
Note that doing it without syscall instruction avoids another
headache; namely, we don't have to worry about returning
from *failed* execve (i.e. return to kernel mode) through
the codepath that is normally taken only when returning
to userland.

FWIW, I would try to pull the asm tail of arm kernel_execve()
into something that would look to C side as
	ret_from_kernel_exec(&regs);	/* never returns */
and start converting architectures to that primitive.  It should
copy the provided pt_regs to normal location (keeping in mind
that there really might be an overlap), set registers (including
stack pointer) for normal return to user path and jump there.
Essentially, that's the real arch-dependent part of kernel_execve() -
transition from kernel thread to userland process.

It can be done architecture-by-architecture; there's no need to make
it a flagday conversion.  Once an arch is handled, we define
something like __ARCH_HAS_RET_FROM_KERNEL_EXEC and get the common
implementations of kernel_execve() and sys_execve() for that -
those could simply live in fs/exec.c under the matching ifdef.
Along with your sys_execveat().  I can probably throw alpha,
arm and x86 conversions into the pile, but it really needs to
be handled on linux-arch, with arch maintainers at least agreeing
in principle with that scheme.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Other Archives]     [Linux Kernel Newbies]     [Linux Driver Development]     [Linux Kbuild]     [Fedora Kernel]     [Linux Kernel Testers]     [Linux SH]     [Linux Omap]     [Linux Tape]     [Linux Input]     [Linux Kernel Janitors]     [Linux Kernel Packagers]     [Linux Doc]     [Linux Man Pages]     [Linux API]     [Linux Memory Management]     [Linux Modules]     [Linux Standards]     [Kernel Announce]     [Netdev]     [Git]     [Linux PCI]     Linux CAN Development     [Linux I2C]     [Linux RDMA]     [Linux NUMA]     [Netfilter]     [Netfilter Devel]     [SELinux]     [Bugtraq]     [FIO]     [Linux Perf Users]     [Linux Serial]     [Linux PPP]     [Linux ISDN]     [Linux Next]     [Kernel Stable Commits]     [Linux Tip Commits]     [Kernel MM Commits]     [Linux Security Module]     [AutoFS]     [Filesystem Development]     [Ext3 Filesystem]     [Linux bcache]     [Ext4 Filesystem]     [Linux BTRFS]     [Linux CEPH Filesystem]     [Linux XFS]     [XFS]     [Linux NFS]     [Linux CIFS]     [Ecryptfs]     [Linux NILFS]     [Linux Cachefs]     [Reiser FS]     [Initramfs]     [Linux FB Devel]     [Linux OpenGL]     [DRI Devel]     [Fastboot]     [Linux RT Users]     [Linux RT Stable]     [eCos]     [Corosync]     [Linux Clusters]     [LVS Devel]     [Hot Plug]     [Linux Virtualization]     [KVM]     [KVM PPC]     [KVM ia64]     [Linux Containers]     [Linux Hexagon]     [Linux Cgroups]     [Util Linux]     [Wireless]     [Linux Bluetooth]     [Bluez Devel]     [Ethernet Bridging]     [Embedded Linux]     [Barebox]     [Linux MMC]     [Linux IIO]     [Sparse]     [Smatch]     [Linux Arch]     [x86 Platform Driver]     [Linux ACPI]     [Linux IBM ACPI]     [LM Sensors]     [CPU Freq]     [Linux Power Management]     [Linmodems]     [Linux DCCP]     [Linux SCTP]     [ALSA Devel]     [Linux USB]     [Linux PA RISC]     [Linux Samsung SOC]     [MIPS Linux]     [IBM S/390 Linux]     [ARM Linux]     [ARM Kernel]     [ARM MSM]     [Tegra Devel]     [Sparc Linux]     [Linux Security]     [Linux Sound]     [Linux Media]     [Video 4 Linux]     [Linux IRDA Users]     [Linux for the blind]     [Linux RAID]     [Linux ATA RAID]     [Device Mapper]     [Linux SCSI]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Linux IDE]     [Linux SMP]     [Linux AXP]     [Linux Alpha]     [Linux M68K]     [Linux ia64]     [Linux 8086]     [Linux x86_64]     [Linux Config]     [Linux Apps]     [Linux MSDOS]     [Linux X.25]     [Linux Crypto]     [DM Crypt]     [Linux Trace Users]     [Linux Btrace]     [Linux Watchdog]     [Utrace Devel]     [Linux C Programming]     [Linux Assembly]     [Dash]     [DWARVES]     [Hail Devel]     [Linux Kernel Debugger]     [Linux gcc]     [Gcc Help]     [X.Org]     [Wine]

Add to Google Powered by Linux

[Older Kernel Discussion]     [Yosemite National Park Forum]     [Large Format Photos]     [Gimp]     [Yosemite Photos]     [Stuff]