Re: [Xen-devel] [PATCH 5/7] xen/p2m: Add logic to revector a P2M tree to use __va leafs.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On Thu, 26 Jul 2012, Konrad Rzeszutek Wilk wrote:
> During bootup Xen supplies us with a P2M array. It sticks
> it right after the ramdisk, as can be seen with a 128GB PV guest:
> 
> (certain parts removed for clarity):
> xc_dom_build_image: called
> xc_dom_alloc_segment:   kernel       : 0xffffffff81000000 -> 0xffffffff81e43000  (pfn 0x1000 + 0xe43 pages)
> xc_dom_pfn_to_ptr: domU mapping: pfn 0x1000+0xe43 at 0x7f097d8bf000
> xc_dom_alloc_segment:   ramdisk      : 0xffffffff81e43000 -> 0xffffffff925c7000  (pfn 0x1e43 + 0x10784 pages)
> xc_dom_pfn_to_ptr: domU mapping: pfn 0x1e43+0x10784 at 0x7f0952dd2000
> xc_dom_alloc_segment:   phys2mach    : 0xffffffff925c7000 -> 0xffffffffa25c7000  (pfn 0x125c7 + 0x10000 pages)
> xc_dom_pfn_to_ptr: domU mapping: pfn 0x125c7+0x10000 at 0x7f0942dd2000
> xc_dom_alloc_page   :   start info   : 0xffffffffa25c7000 (pfn 0x225c7)
> xc_dom_alloc_page   :   xenstore     : 0xffffffffa25c8000 (pfn 0x225c8)
> xc_dom_alloc_page   :   console      : 0xffffffffa25c9000 (pfn 0x225c9)
> nr_page_tables: 0x0000ffffffffffff/48: 0xffff000000000000 -> 0xffffffffffffffff, 1 table(s)
> nr_page_tables: 0x0000007fffffffff/39: 0xffffff8000000000 -> 0xffffffffffffffff, 1 table(s)
> nr_page_tables: 0x000000003fffffff/30: 0xffffffff80000000 -> 0xffffffffbfffffff, 1 table(s)
> nr_page_tables: 0x00000000001fffff/21: 0xffffffff80000000 -> 0xffffffffa27fffff, 276 table(s)
> xc_dom_alloc_segment:   page tables  : 0xffffffffa25ca000 -> 0xffffffffa26e1000  (pfn 0x225ca + 0x117 pages)
> xc_dom_pfn_to_ptr: domU mapping: pfn 0x225ca+0x117 at 0x7f097d7a8000
> xc_dom_alloc_page   :   boot stack   : 0xffffffffa26e1000 (pfn 0x226e1)
> xc_dom_build_image  : virt_alloc_end : 0xffffffffa26e2000
> xc_dom_build_image  : virt_pgtab_end : 0xffffffffa2800000
> 
> So the physical memory and virtual (using __START_KERNEL_map addresses)
> layout looks as so:
> 
>   phys                             __ka
> /------------\                   /-------------------\
> | 0          | empty             | 0xffffffff80000000|
> | ..         |                   | ..                |
> | 16MB       | <= kernel starts  | 0xffffffff81000000|
> | ..         |                   |                   |
> | 30MB       | <= kernel ends => | 0xffffffff81e43000|
> | ..         |  & ramdisk starts | ..                |
> | 293MB      | <= ramdisk ends=> | 0xffffffff925c7000|
> | ..         |  & P2M starts     | ..                |
> | ..         |                   | ..                |
> | 549MB      | <= P2M ends    => | 0xffffffffa25c7000|
> | ..         | start_info        | 0xffffffffa25c7000|
> | ..         | xenstore          | 0xffffffffa25c8000|
> | ..         | cosole            | 0xffffffffa25c9000|
> | 549MB      | <= page tables => | 0xffffffffa25ca000|
> | ..         |                   |                   |
> | 550MB      | <= PGT end     => | 0xffffffffa26e1000|
> | ..         | boot stack        |                   |
> \------------/                   \-------------------/
> 
> As can be seen, the ramdisk, P2M and pagetables are taking
> a bit of __ka addresses space. Which is a problem since the
> MODULES_VADDR starts at 0xffffffffa0000000 - and P2M sits
> right in there! This results during bootup with the inability to
> load modules, with this error:
> 
> ------------[ cut here ]------------
> WARNING: at /home/konrad/ssd/linux/mm/vmalloc.c:106 vmap_page_range_noflush+0x2d9/0x370()
> Call Trace:
>  [<ffffffff810719fa>] warn_slowpath_common+0x7a/0xb0
>  [<ffffffff81030279>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
>  [<ffffffff81071a45>] warn_slowpath_null+0x15/0x20
>  [<ffffffff81130b89>] vmap_page_range_noflush+0x2d9/0x370
>  [<ffffffff81130c4d>] map_vm_area+0x2d/0x50
>  [<ffffffff811326d0>] __vmalloc_node_range+0x160/0x250
>  [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
>  [<ffffffff810c6186>] ? load_module+0x66/0x19c0
>  [<ffffffff8105cadc>] module_alloc+0x5c/0x60
>  [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
>  [<ffffffff810c5369>] module_alloc_update_bounds+0x19/0x80
>  [<ffffffff810c70c3>] load_module+0xfa3/0x19c0
>  [<ffffffff812491f6>] ? security_file_permission+0x86/0x90
>  [<ffffffff810c7b3a>] sys_init_module+0x5a/0x220
>  [<ffffffff815ce339>] system_call_fastpath+0x16/0x1b
> ---[ end trace fd8f7704fdea0291 ]---
> vmalloc: allocation failure, allocated 16384 of 20480 bytes
> modprobe: page allocation failure: order:0, mode:0xd2
> 
> Since the __va and __ka are 1:1 up to MODULES_VADDR and
> cleanup_highmap rids __ka of the ramdisk mapping, what
> we want to do is similar - get rid of the P2M in the __ka
> address space. There are two ways of fixing this:
> 
>  1) All P2M lookups instead of using the __ka address would
>     use the __va address. This means we can safely erase from
>     __ka space the PMD pointers that point to the PFNs for
>     P2M array and be OK.
>  2). Allocate a new array, copy the existing P2M into it,
>     revector the P2M tree to use that, and return the old
>     P2M to the memory allocate. This has the advantage that
>     it sets the stage for using XEN_ELF_NOTE_INIT_P2M
>     feature. That feature allows us to set the exact virtual
>     address space we want for the P2M - and allows us to
>     boot as initial domain on large machines.
> 
> So we pick option 2).

1) looks like a decent option that requires less code.
Is the problem with 1) that we might want to access the P2M before we
have __va addresses ready?



> This patch only lays the groundwork in the P2M code. The patch
> that modifies the MMU is called "xen/mmu: Copy and revector the P2M tree."
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> ---
>  arch/x86/xen/p2m.c     |   70 ++++++++++++++++++++++++++++++++++++++++++++++++
>  arch/x86/xen/xen-ops.h |    1 +
>  2 files changed, 71 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> index 6a2bfa4..bbfd085 100644
> --- a/arch/x86/xen/p2m.c
> +++ b/arch/x86/xen/p2m.c
> @@ -394,7 +394,77 @@ void __init xen_build_dynamic_phys_to_machine(void)
>  	 * Xen provided pagetable). Do it later in xen_reserve_internals.
>  	 */
>  }
> +#ifdef CONFIG_X86_64
> +#include <linux/bootmem.h>
> +unsigned long __init xen_revector_p2m_tree(void)
> +{
> +	unsigned long va_start;
> +	unsigned long va_end;
> +	unsigned long pfn;
> +	unsigned long *mfn_list = NULL;
> +	unsigned long size;
> +
> +	va_start = xen_start_info->mfn_list;
> +	/*We copy in increments of P2M_PER_PAGE * sizeof(unsigned long),
> +	 * so make sure it is rounded up to that */
> +	size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
> +	va_end = va_start + size;
> +
> +	/* If we were revectored already, don't do it again. */
> +	if (va_start <= __START_KERNEL_map && va_start >= __PAGE_OFFSET)
> +		return 0;
> +
> +	mfn_list = alloc_bootmem_align(size, PAGE_SIZE);
> +	if (!mfn_list) {
> +		pr_warn("Could not allocate space for a new P2M tree!\n");
> +		return xen_start_info->mfn_list;
> +	}
> +	/* Fill it out with INVALID_P2M_ENTRY value */
> +	memset(mfn_list, 0xFF, size);
> +
> +	for (pfn = 0; pfn < ALIGN(MAX_DOMAIN_PAGES, P2M_PER_PAGE); pfn += P2M_PER_PAGE) {
> +		unsigned topidx = p2m_top_index(pfn);
> +		unsigned mididx;
> +		unsigned long *mid_p;
> +
> +		if (!p2m_top[topidx])
> +			continue;
> +
> +		if (p2m_top[topidx] == p2m_mid_missing)
> +			continue;
> +
> +		mididx = p2m_mid_index(pfn);
> +		mid_p = p2m_top[topidx][mididx];
> +		if (!mid_p)
> +			continue;
> +		if ((mid_p == p2m_missing) || (mid_p == p2m_identity))
> +			continue;
> +
> +		if ((unsigned long)mid_p == INVALID_P2M_ENTRY)
> +			continue;
> +
> +		/* The old va. Rebase it on mfn_list */
> +		if (mid_p >= (unsigned long *)va_start && mid_p <= (unsigned long *)va_end) {
> +			unsigned long *new;
> +
> +			new = &mfn_list[pfn];
> +
> +			copy_page(new, mid_p);
> +			p2m_top[topidx][mididx] = &mfn_list[pfn];
> +			p2m_top_mfn_p[topidx][mididx] = virt_to_mfn(&mfn_list[pfn]);
>  
> +		}
> +		/* This should be the leafs allocated for identity from _brk. */
> +	}
> +	return (unsigned long)mfn_list;
> +
> +}
> +#else
> +unsigned long __init xen_revector_p2m_tree(void)
> +{
> +	return 0;
> +}
> +#endif
>  unsigned long get_phys_to_machine(unsigned long pfn)
>  {
>  	unsigned topidx, mididx, idx;
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index 2230f57..bb5a810 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -45,6 +45,7 @@ void xen_hvm_init_shared_info(void);
>  void xen_unplug_emulated_devices(void);
>  
>  void __init xen_build_dynamic_phys_to_machine(void);
> +unsigned long __init xen_revector_p2m_tree(void);
>  
>  void xen_init_irq_ops(void);
>  void xen_setup_timer(int cpu);
> -- 
> 1.7.7.6
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Other Archives]     [Linux Kernel Newbies]     [Linux Driver Development]     [Linux Kbuild]     [Fedora Kernel]     [Linux Kernel Testers]     [Linux SH]     [Linux Omap]     [Linux Tape]     [Linux Input]     [Linux Kernel Janitors]     [Linux Kernel Packagers]     [Linux Doc]     [Linux Man Pages]     [Linux API]     [Linux Memory Management]     [Linux Modules]     [Linux Standards]     [Kernel Announce]     [Netdev]     [Git]     [Linux PCI]     Linux CAN Development     [Linux I2C]     [Linux RDMA]     [Linux NUMA]     [Netfilter]     [Netfilter Devel]     [SELinux]     [Bugtraq]     [FIO]     [Linux Perf Users]     [Linux Serial]     [Linux PPP]     [Linux ISDN]     [Linux Next]     [Kernel Stable Commits]     [Linux Tip Commits]     [Kernel MM Commits]     [Linux Security Module]     [AutoFS]     [Filesystem Development]     [Ext3 Filesystem]     [Linux bcache]     [Ext4 Filesystem]     [Linux BTRFS]     [Linux CEPH Filesystem]     [Linux XFS]     [XFS]     [Linux NFS]     [Linux CIFS]     [Ecryptfs]     [Linux NILFS]     [Linux Cachefs]     [Reiser FS]     [Initramfs]     [Linux FB Devel]     [Linux OpenGL]     [DRI Devel]     [Fastboot]     [Linux RT Users]     [Linux RT Stable]     [eCos]     [Corosync]     [Linux Clusters]     [LVS Devel]     [Hot Plug]     [Linux Virtualization]     [KVM]     [KVM PPC]     [KVM ia64]     [Linux Containers]     [Linux Hexagon]     [Linux Cgroups]     [Util Linux]     [Wireless]     [Linux Bluetooth]     [Bluez Devel]     [Ethernet Bridging]     [Embedded Linux]     [Barebox]     [Linux MMC]     [Linux IIO]     [Sparse]     [Smatch]     [Linux Arch]     [x86 Platform Driver]     [Linux ACPI]     [Linux IBM ACPI]     [LM Sensors]     [CPU Freq]     [Linux Power Management]     [Linmodems]     [Linux DCCP]     [Linux SCTP]     [ALSA Devel]     [Linux USB]     [Linux PA RISC]     [Linux Samsung SOC]     [MIPS Linux]     [IBM S/390 Linux]     [ARM Linux]     [ARM Kernel]     [ARM MSM]     [Tegra Devel]     [Sparc Linux]     [Linux Security]     [Linux Sound]     [Linux Media]     [Video 4 Linux]     [Linux IRDA Users]     [Linux for the blind]     [Linux RAID]     [Linux ATA RAID]     [Device Mapper]     [Linux SCSI]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Linux IDE]     [Linux SMP]     [Linux AXP]     [Linux Alpha]     [Linux M68K]     [Linux ia64]     [Linux 8086]     [Linux x86_64]     [Linux Config]     [Linux Apps]     [Linux MSDOS]     [Linux X.25]     [Linux Crypto]     [DM Crypt]     [Linux Trace Users]     [Linux Btrace]     [Linux Watchdog]     [Utrace Devel]     [Linux C Programming]     [Linux Assembly]     [Dash]     [DWARVES]     [Hail Devel]     [Linux Kernel Debugger]     [Linux gcc]     [Gcc Help]     [X.Org]     [Wine]

Add to Google Powered by Linux

[Older Kernel Discussion]     [Yosemite National Park Forum]     [Large Format Photos]     [Gimp]     [Yosemite Photos]     [Stuff]