|
|
|
Re: [Xen-devel] [PATCH 5/7] xen/p2m: Add logic to revector a P2M tree to use __va leafs. | |
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
|
|
On Thu, 26 Jul 2012, Konrad Rzeszutek Wilk wrote:
> During bootup Xen supplies us with a P2M array. It sticks
> it right after the ramdisk, as can be seen with a 128GB PV guest:
>
> (certain parts removed for clarity):
> xc_dom_build_image: called
> xc_dom_alloc_segment: kernel : 0xffffffff81000000 -> 0xffffffff81e43000 (pfn 0x1000 + 0xe43 pages)
> xc_dom_pfn_to_ptr: domU mapping: pfn 0x1000+0xe43 at 0x7f097d8bf000
> xc_dom_alloc_segment: ramdisk : 0xffffffff81e43000 -> 0xffffffff925c7000 (pfn 0x1e43 + 0x10784 pages)
> xc_dom_pfn_to_ptr: domU mapping: pfn 0x1e43+0x10784 at 0x7f0952dd2000
> xc_dom_alloc_segment: phys2mach : 0xffffffff925c7000 -> 0xffffffffa25c7000 (pfn 0x125c7 + 0x10000 pages)
> xc_dom_pfn_to_ptr: domU mapping: pfn 0x125c7+0x10000 at 0x7f0942dd2000
> xc_dom_alloc_page : start info : 0xffffffffa25c7000 (pfn 0x225c7)
> xc_dom_alloc_page : xenstore : 0xffffffffa25c8000 (pfn 0x225c8)
> xc_dom_alloc_page : console : 0xffffffffa25c9000 (pfn 0x225c9)
> nr_page_tables: 0x0000ffffffffffff/48: 0xffff000000000000 -> 0xffffffffffffffff, 1 table(s)
> nr_page_tables: 0x0000007fffffffff/39: 0xffffff8000000000 -> 0xffffffffffffffff, 1 table(s)
> nr_page_tables: 0x000000003fffffff/30: 0xffffffff80000000 -> 0xffffffffbfffffff, 1 table(s)
> nr_page_tables: 0x00000000001fffff/21: 0xffffffff80000000 -> 0xffffffffa27fffff, 276 table(s)
> xc_dom_alloc_segment: page tables : 0xffffffffa25ca000 -> 0xffffffffa26e1000 (pfn 0x225ca + 0x117 pages)
> xc_dom_pfn_to_ptr: domU mapping: pfn 0x225ca+0x117 at 0x7f097d7a8000
> xc_dom_alloc_page : boot stack : 0xffffffffa26e1000 (pfn 0x226e1)
> xc_dom_build_image : virt_alloc_end : 0xffffffffa26e2000
> xc_dom_build_image : virt_pgtab_end : 0xffffffffa2800000
>
> So the physical memory and virtual (using __START_KERNEL_map addresses)
> layout looks as so:
>
> phys __ka
> /------------\ /-------------------\
> | 0 | empty | 0xffffffff80000000|
> | .. | | .. |
> | 16MB | <= kernel starts | 0xffffffff81000000|
> | .. | | |
> | 30MB | <= kernel ends => | 0xffffffff81e43000|
> | .. | & ramdisk starts | .. |
> | 293MB | <= ramdisk ends=> | 0xffffffff925c7000|
> | .. | & P2M starts | .. |
> | .. | | .. |
> | 549MB | <= P2M ends => | 0xffffffffa25c7000|
> | .. | start_info | 0xffffffffa25c7000|
> | .. | xenstore | 0xffffffffa25c8000|
> | .. | cosole | 0xffffffffa25c9000|
> | 549MB | <= page tables => | 0xffffffffa25ca000|
> | .. | | |
> | 550MB | <= PGT end => | 0xffffffffa26e1000|
> | .. | boot stack | |
> \------------/ \-------------------/
>
> As can be seen, the ramdisk, P2M and pagetables are taking
> a bit of __ka addresses space. Which is a problem since the
> MODULES_VADDR starts at 0xffffffffa0000000 - and P2M sits
> right in there! This results during bootup with the inability to
> load modules, with this error:
>
> ------------[ cut here ]------------
> WARNING: at /home/konrad/ssd/linux/mm/vmalloc.c:106 vmap_page_range_noflush+0x2d9/0x370()
> Call Trace:
> [<ffffffff810719fa>] warn_slowpath_common+0x7a/0xb0
> [<ffffffff81030279>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
> [<ffffffff81071a45>] warn_slowpath_null+0x15/0x20
> [<ffffffff81130b89>] vmap_page_range_noflush+0x2d9/0x370
> [<ffffffff81130c4d>] map_vm_area+0x2d/0x50
> [<ffffffff811326d0>] __vmalloc_node_range+0x160/0x250
> [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
> [<ffffffff810c6186>] ? load_module+0x66/0x19c0
> [<ffffffff8105cadc>] module_alloc+0x5c/0x60
> [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
> [<ffffffff810c5369>] module_alloc_update_bounds+0x19/0x80
> [<ffffffff810c70c3>] load_module+0xfa3/0x19c0
> [<ffffffff812491f6>] ? security_file_permission+0x86/0x90
> [<ffffffff810c7b3a>] sys_init_module+0x5a/0x220
> [<ffffffff815ce339>] system_call_fastpath+0x16/0x1b
> ---[ end trace fd8f7704fdea0291 ]---
> vmalloc: allocation failure, allocated 16384 of 20480 bytes
> modprobe: page allocation failure: order:0, mode:0xd2
>
> Since the __va and __ka are 1:1 up to MODULES_VADDR and
> cleanup_highmap rids __ka of the ramdisk mapping, what
> we want to do is similar - get rid of the P2M in the __ka
> address space. There are two ways of fixing this:
>
> 1) All P2M lookups instead of using the __ka address would
> use the __va address. This means we can safely erase from
> __ka space the PMD pointers that point to the PFNs for
> P2M array and be OK.
> 2). Allocate a new array, copy the existing P2M into it,
> revector the P2M tree to use that, and return the old
> P2M to the memory allocate. This has the advantage that
> it sets the stage for using XEN_ELF_NOTE_INIT_P2M
> feature. That feature allows us to set the exact virtual
> address space we want for the P2M - and allows us to
> boot as initial domain on large machines.
>
> So we pick option 2).
1) looks like a decent option that requires less code.
Is the problem with 1) that we might want to access the P2M before we
have __va addresses ready?
> This patch only lays the groundwork in the P2M code. The patch
> that modifies the MMU is called "xen/mmu: Copy and revector the P2M tree."
>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> ---
> arch/x86/xen/p2m.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++
> arch/x86/xen/xen-ops.h | 1 +
> 2 files changed, 71 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> index 6a2bfa4..bbfd085 100644
> --- a/arch/x86/xen/p2m.c
> +++ b/arch/x86/xen/p2m.c
> @@ -394,7 +394,77 @@ void __init xen_build_dynamic_phys_to_machine(void)
> * Xen provided pagetable). Do it later in xen_reserve_internals.
> */
> }
> +#ifdef CONFIG_X86_64
> +#include <linux/bootmem.h>
> +unsigned long __init xen_revector_p2m_tree(void)
> +{
> + unsigned long va_start;
> + unsigned long va_end;
> + unsigned long pfn;
> + unsigned long *mfn_list = NULL;
> + unsigned long size;
> +
> + va_start = xen_start_info->mfn_list;
> + /*We copy in increments of P2M_PER_PAGE * sizeof(unsigned long),
> + * so make sure it is rounded up to that */
> + size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
> + va_end = va_start + size;
> +
> + /* If we were revectored already, don't do it again. */
> + if (va_start <= __START_KERNEL_map && va_start >= __PAGE_OFFSET)
> + return 0;
> +
> + mfn_list = alloc_bootmem_align(size, PAGE_SIZE);
> + if (!mfn_list) {
> + pr_warn("Could not allocate space for a new P2M tree!\n");
> + return xen_start_info->mfn_list;
> + }
> + /* Fill it out with INVALID_P2M_ENTRY value */
> + memset(mfn_list, 0xFF, size);
> +
> + for (pfn = 0; pfn < ALIGN(MAX_DOMAIN_PAGES, P2M_PER_PAGE); pfn += P2M_PER_PAGE) {
> + unsigned topidx = p2m_top_index(pfn);
> + unsigned mididx;
> + unsigned long *mid_p;
> +
> + if (!p2m_top[topidx])
> + continue;
> +
> + if (p2m_top[topidx] == p2m_mid_missing)
> + continue;
> +
> + mididx = p2m_mid_index(pfn);
> + mid_p = p2m_top[topidx][mididx];
> + if (!mid_p)
> + continue;
> + if ((mid_p == p2m_missing) || (mid_p == p2m_identity))
> + continue;
> +
> + if ((unsigned long)mid_p == INVALID_P2M_ENTRY)
> + continue;
> +
> + /* The old va. Rebase it on mfn_list */
> + if (mid_p >= (unsigned long *)va_start && mid_p <= (unsigned long *)va_end) {
> + unsigned long *new;
> +
> + new = &mfn_list[pfn];
> +
> + copy_page(new, mid_p);
> + p2m_top[topidx][mididx] = &mfn_list[pfn];
> + p2m_top_mfn_p[topidx][mididx] = virt_to_mfn(&mfn_list[pfn]);
>
> + }
> + /* This should be the leafs allocated for identity from _brk. */
> + }
> + return (unsigned long)mfn_list;
> +
> +}
> +#else
> +unsigned long __init xen_revector_p2m_tree(void)
> +{
> + return 0;
> +}
> +#endif
> unsigned long get_phys_to_machine(unsigned long pfn)
> {
> unsigned topidx, mididx, idx;
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index 2230f57..bb5a810 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -45,6 +45,7 @@ void xen_hvm_init_shared_info(void);
> void xen_unplug_emulated_devices(void);
>
> void __init xen_build_dynamic_phys_to_machine(void);
> +unsigned long __init xen_revector_p2m_tree(void);
>
> void xen_init_irq_ops(void);
> void xen_setup_timer(int cpu);
> --
> 1.7.7.6
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Other Archives] [Linux Kernel Newbies] [Linux Driver Development] [Linux Kbuild] [Fedora Kernel] [Linux Kernel Testers] [Linux SH] [Linux Omap] [Linux Tape] [Linux Input] [Linux Kernel Janitors] [Linux Kernel Packagers] [Linux Doc] [Linux Man Pages] [Linux API] [Linux Memory Management] [Linux Modules] [Linux Standards] [Kernel Announce] [Netdev] [Git] [Linux PCI] Linux CAN Development [Linux I2C] [Linux RDMA] [Linux NUMA] [Netfilter] [Netfilter Devel] [SELinux] [Bugtraq] [FIO] [Linux Perf Users] [Linux Serial] [Linux PPP] [Linux ISDN] [Linux Next] [Kernel Stable Commits] [Linux Tip Commits] [Kernel MM Commits] [Linux Security Module] [AutoFS] [Filesystem Development] [Ext3 Filesystem] [Linux bcache] [Ext4 Filesystem] [Linux BTRFS] [Linux CEPH Filesystem] [Linux XFS] [XFS] [Linux NFS] [Linux CIFS] [Ecryptfs] [Linux NILFS] [Linux Cachefs] [Reiser FS] [Initramfs] [Linux FB Devel] [Linux OpenGL] [DRI Devel] [Fastboot] [Linux RT Users] [Linux RT Stable] [eCos] [Corosync] [Linux Clusters] [LVS Devel] [Hot Plug] [Linux Virtualization] [KVM] [KVM PPC] [KVM ia64] [Linux Containers] [Linux Hexagon] [Linux Cgroups] [Util Linux] [Wireless] [Linux Bluetooth] [Bluez Devel] [Ethernet Bridging] [Embedded Linux] [Barebox] [Linux MMC] [Linux IIO] [Sparse] [Smatch] [Linux Arch] [x86 Platform Driver] [Linux ACPI] [Linux IBM ACPI] [LM Sensors] [CPU Freq] [Linux Power Management] [Linmodems] [Linux DCCP] [Linux SCTP] [ALSA Devel] [Linux USB] [Linux PA RISC] [Linux Samsung SOC] [MIPS Linux] [IBM S/390 Linux] [ARM Linux] [ARM Kernel] [ARM MSM] [Tegra Devel] [Sparc Linux] [Linux Security] [Linux Sound] [Linux Media] [Video 4 Linux] [Linux IRDA Users] [Linux for the blind] [Linux RAID] [Linux ATA RAID] [Device Mapper] [Linux SCSI] [SCSI Target Devel] [Linux SCSI Target Infrastructure] [Linux IDE] [Linux SMP] [Linux AXP] [Linux Alpha] [Linux M68K] [Linux ia64] [Linux 8086] [Linux x86_64] [Linux Config] [Linux Apps] [Linux MSDOS] [Linux X.25] [Linux Crypto] [DM Crypt] [Linux Trace Users] [Linux Btrace] [Linux Watchdog] [Utrace Devel] [Linux C Programming] [Linux Assembly] [Dash] [DWARVES] [Hail Devel] [Linux Kernel Debugger] [Linux gcc] [Gcc Help] [X.Org] [Wine]
![]() |
![]() |
[Older Kernel Discussion] [Yosemite National Park Forum] [Large Format Photos] [Gimp] [Yosemite Photos] [Stuff]