web developer & system programmer

coder . cl

ramblings and thoughts on programming...


foreign system call emulations on freebsd

published: 19-07-2009 / updated: 19-07-2009
posted in: c, freebsd, programming
by Daniel Molina Wegener

Call simulations are common nowdays techniques to use foreign applications — build for other operating systems, such as M$ Windows — in the FreeBSD platform. The common technique is to create an interface to the real system calls replacing foreign system calls with wrappers on them. To build this task an assembler instruction is used. On call simulations we have the usage of win32 codecs, wine and valgrind — valgrind do not use foreign system calls, but replaces standard library routines. Most of them are using — behind all those system calls and standard library routines — a common assembler instruction: lldt. lldt stands for "Load Local Descriptor Table" and it’s related to Segment Descriptor Tables. This article is a lightweight introduction to the use of lldt assembler instruction.

LLDT loads the Local Descriptor Table register (LDTR). The word operand (memory or register) to LLDT should contain a selector to the Global Descriptor Table (GDT). The GDT entry should be a Local Descriptor Table. If so, then the LDTR is loaded from the entry. The descriptor registers DS, ES, SS, FS, GS, and CS are not affected. The LDT field in the task state segment does not change.

The selector operand can be 0; if so, the LDTR is marked invalid. All descriptor references (except by the LAR, VERR, VERW or LSL instructions) cause a #GP fault.

LLDT is used in operating system software; it is not used in application programs.

Thanks to this instruction set — specific for Intel architecture — each task can have it’s own LDT. This instructions only works in Protected Mode, and seems to be made specially for virtual addressing purposes. This means that each task with an LDT in Protected Mode, can have it’s own memory segment, call gates, an so.

usage on valgrind under freebsd

valgrind makes a call to i386_set_ldt(2) on FreeBSD.

The i386_get_ldt() system call returns a list of the i386 descriptors in the current process’ LDT. The i386_set_ldt() system call sets a list of i386 descriptors in the current process’ LDT. For both routines, start_sel specifies the index of the selector in the LDT at which to begin and descs points to an array of num_sels descriptors to be set or returned.

/* the descriptor union (union and not struct for alignment) */
union descriptor ldt;

/* calculate the page count using the process base address and it's
    ending address where VKI_BYTES_PER_PAGE is defined as (1 << 12)*/
UInt limit = (VG_(client_end)-VG_(client_base)) / VKI_BYTES_PER_PAGE;

Int ret;

/* apply zeroes to the descriptor ;) */
memset(&ldt, 0, sizeof(ldt));

/* set limit to the lsb extent en ensures it's bits */
ldt.sd.sd_lolimit = limit & 0xffff;

/* set limit to the lsb base en ensures it's bits */
ldt.sd.sd_lobase = VG_(client_base) & 0xffffff;

/* this is important! set the access for the described memory
   segment to read/write access, that's because the machine
   byte code is loaded into the described segments. */
ldt.sd.sd_type = SDT_MEMRWA;

/* set the user priority to 3 */
ldt.sd.sd_dpl = SEL_UPL;

/* set to be present, the kernel check only for ring-3 (SEL_UPL)
   to be present:
    if ((dp->sd.sd_p != 0) && (dp->sd.sd_dpl != SEL_UPL))
        return (EACCES); */
ldt.sd.sd_p = 1;      /* present */

/* set the upper limit for msb extent */
ldt.sd.sd_hilimit = (limit >> 16) & 0xf;

/* user 32 bit addressing */
ldt.sd.sd_def32 = 1;  /* 32 bit */

/* set granularity to 1 page */
ldt.sd.sd_gran = 1;   /* limit in pages */

/* set the upper limit for msb base */
ldt.sd.sd_hibase = (VG_(client_base) >> 24) & 0xff;

/* and finally do the system call i386_set_ldt(2) */
ret = i386_set_ldt(VG_POINTERCHECK_SEGIDX, &ldt, 1);

And then. What does the i386_set_ldt(2) call do?. The call is made through sysarch(2) system call and directly i386_set_ldt(2). It take three arguments, the starting LDT index start, a pointer to the descriptors array descs and the number of descriptors to process num. What is inside the call?. The sysarch takes the switch from the call to sysarch(I386_SET_LDT, &p);, where p is a struct of i386_ldt_args, that holds the three arguments and passes to the sysarch(2) system call. After copying the arguments from user space to kernel space, the magic happens and the kernel interface i386_set_ldt is called. Here two locks occurs, first, after building the switch on the user arguments and sending them to the kernel space, the Giant lock is acquired — :( yes, that’s true on FreeBSD versions prior to 7.X. Then, on the kernel interface i386_set_ldt, the array of LDT descriptors are validated, in such form that the are checked for invalid types, as TSS access and memory segment types — both of them must not be accessed from user space, because are designed for operating system purposes. Then, the priority and present verification is done: if ((dp->sd.sd_p != 0) && (dp->sd.sd_dpl != SEL_UPL)), this verifies that the present flag is set to 1 — one bit bitfield — and the priority is set to the ring-3 or user level. It it does not fails, the scheduler spin lock is acquired and the LDT is grown through i386_ldt_grow kernel interface. Here, the set_user_ldt kernel interface is called. What is set_user_ldt? The call to the lldt instruction!

/* sys/i386/include/cpufunc.h */
static __inline void
lldt(u_short sel)
{
    __asm __volatile("lldt %0" : : "r" (sel));
}

And how it is called?

/*
 * Update the GDT entry pointing to the LDT to point to the LDT of the
 * current process.
 *
 * This must be called with sched_lock held.  Unfortunately, we can't use a
 * mtx_assert() here because cpu_switch() calls this function after changing
 * curproc but before sched_lock's owner is updated in mi_switch().
 */
void
set_user_ldt(struct mdproc *mdp)
{
    struct proc_ldt *pldt;

    pldt = mdp->md_ldt;
#ifdef SMP
    gdt[PCPU_GET(cpuid) * NGDT + GUSERLDT_SEL].sd = pldt->ldt_sd;
#else
    gdt[GUSERLDT_SEL].sd = pldt->ldt_sd;
#endif
    lldt(GSEL(GUSERLDT_SEL, SEL_KPL));
    PCPU_SET(currentldt, GSEL(GUSERLDT_SEL, SEL_KPL));
}

Here the call to set_user_ldt is made through smp_rendezvous on SMP kernels and directly on non-SMP kernels. On SMP kernels, a GDT is selected using the CPU ID selector, that uses the CPU variable name for the current process and places the address of the LDT segment descriptor in the GDT — as a descriptor table selection process — and then calls the lldt instruction with that address on the GDT. Then, the per CPU currentldt to work on, is selected through PCPU_SET macro and the magic is done. On a single CPU system, the unique CPU on it will always be selected by default. Also, lldt is called on task switching. But task switch it’s something for other post, mainly because it regards the Process Scheduler and similar topics.

For a while, the code on the task switching code is bellow and in short, the call is made to select the default LDT if it differs from the current LDT:

movl    _default_ldt,%eax
cmpl    PCPU(CURRENTLDT),%eax
je  2f
lldt    _default_ldt
movl    %eax,PCPU(CURRENTLDT)
jmp 2f

the usage on wine under freebsd

Wine have not much differences on the usage than valgrind. The main difference is that Wine define it’s own LDT structure — and it’s required by the Win32 API, the same one with different declaration — as follows:

typedef struct _LDT_ENTRY {
    WORD    LimitLow;
    WORD    BaseLow;
    union {
        struct {
            BYTE    BaseMid;
            BYTE    Flags1;
            BYTE    Flags2;
            BYTE    BaseHi;
        } Bytes;
        struct {
            /* this comment is mine: WTF?!?! Pascal case mixed
               with underscore!!! The code looks really ugly. */
            unsigned    BaseMid: 8;
            unsigned    Type : 5;
            unsigned    Dpl : 2;
            unsigned    Pres : 1;
            unsigned    LimitHi : 4;
            unsigned    Sys : 1;
            unsigned    Reserved_0 : 1;
            unsigned    Default_Big : 1;
            unsigned    Granularity : 1;
            unsigned    BaseHi : 8;
        } Bits;
    } HighWord;
#ifdef _WIN64  /* FIXME: 64-bit code should not be using the LDT */
    DWORD BaseHigh;
#endif
} LDT_ENTRY, *PLDT_ENTRY;

Then, the call is made verifying — as the kernel does? — that the LDT have enabled the present flag and the priority is set to the ring-3. Before calling it, it does a copy of the required LDT for running the Windows Process.

LDT_ENTRY entry_copy = *entry;
/* The kernel will only let us set LDTs with user priority level */
if (entry_copy.HighWord.Bits.Pres
    && entry_copy.HighWord.Bits.Dpl != 3)
    entry_copy.HighWord.Bits.Dpl = 3;
    ret = i386_set_ldt(index, (union descriptor *)&entry_copy, 1);
    if (ret < 0)
    {
        perror("i386_set_ldt");
        fprintf( stderr, "Did you reconfigure the kernel with "options USER_LDT"?n" );
        exit(1);
    }
}

I remember a modification made from some compilers that were using a fixed user space address on Windows to load the entry point, instead of calculating it, before the LDT was set, crashing with a segfault signal and Invalid address message. Possibly in this thread and better explained problem.

The main reason the reservation code is still disabled on FreeBSD is because mmap(NULL) only tries addresses after the executable + some malloc heap space. Wine is located at 0x7bf00000. The heap size is currently set to 0×02000000 by the wine-freebsd loader, which is thought to be the absolute minimum required to support FreeBSD 6.

linux emulation layer on freebsd

This layer of emulation implements modify_ldt(2) system call. The call implementation is done through linux_modify_ldt kernel interface. Here, the code is simpler. Just acquire the Giant Lock and releases it after calling the kernel interface i386_set_ldt — not the system call — and it sets the SDT_MEMRO type!. This means that every memory block referenced by the segment descriptor is read only.

desc.sd.sd_lolimit = (ld.limit & 0x0000ffff);
desc.sd.sd_hilimit = (ld.limit & 0x000f0000) >> 16;
desc.sd.sd_lobase = (ld.base_addr & 0x00ffffff);
desc.sd.sd_hibase = (ld.base_addr & 0xff000000) >> 24;
desc.sd.sd_type = SDT_MEMRO | ((ld.read_exec_only ^ 1) << 1) | (ld.contents << 2);
desc.sd.sd_dpl = 3;
desc.sd.sd_p = (ld.seg_not_present ^ 1);
desc.sd.sd_xx = 0;
desc.sd.sd_def32 = ld.seg_32bit;
desc.sd.sd_gran = ld.limit_in_pages;
mtx_lock(&Giant);
error = i386_set_ldt(td, &ldt, &desc);
mtx_unlock(&Giant);

The manual page for modify_ldt(2) on Linux says that you can not call it directly, you must use syscall(2) instead, and it is not made though a system specific call such as sysarch(2).

modify_ldt() reads or writes the local descriptor table (ldt) for a process. The ldt is a per-process memory management table used by the i386 processor. For more information on this table, see an Intel 386 processor handbook.


2 comments to “foreign system call emulations on freebsd”

  1. [...] them. This command line will generate a large trace on your command and as I’ve posted in a previous topic, valgrind replaces the allocation system calls with his own group of system calls, wrapping the [...]

  2. [...] them. This command line will generate a large trace on your command and as I’ve posted in a previous topic, valgrind replaces the allocation system calls with his own group of system calls, wrapping the [...]

post a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>