RE: [DRM] drm_get_connector_name internal static string buffer causes a race

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
will you accept bash scripts for reloading driver/X/FLR for intel-gpu-tools to uncover exists and future bugs besides those patches?
________________________________________
From: Daniel Vetter <daniel.vetter@xxxxxxxx> on behalf of Daniel Vetter <daniel@xxxxxxxx>
Sent: Tuesday, March 25, 2014 2:43 PM
To: Dmitry Malkin
Cc: Daniel Vetter; dri-devel@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: [DRM] drm_get_connector_name internal static string buffer causes a race

On Tue, Mar 25, 2014 at 08:08:23AM +0000, Dmitry Malkin wrote:
> Hello, Daniel,
>
> Thank you for response. I've got a couple questions for you:
>
> 0. What do you think about making integration test with continuous reloading i915 driver and X server (with FLR between iteration)?
> Simplified example for ubuntu (root required):

Module reloading is know to be horribly racy atm. Don't do that in any
kind of stress-test situation before fixing up piles of bugs ;-)

Will mean: You'll uncover at _lot_ more than just this. Patches for all
this highly welcome though.

Thanks, Daniel

>
> #!/bin/bash
> service lightdm stop
> rmmod snd_hda_intel
> echo -n "0000:00:02.0" > /sys/bus/pci/devices/0000\:00\:02.0/driver/unbind
> rmmod i915
> echo 1 > /sys/bus/pci/devices/0000\:00\:02.0/reset
> modprobe i915
> service lightdm start
>
> This will uncover next problems:
> * sysfs add/remove i2c connectors (parent/child warning)
> * drm static buffer races
> * NX bit violation crash (see dump below)
> * and bunch of DMAR problems
>
>
> 1. Could you point me out git/branch with FIXME comments?
>
> 2. About kfree() problem: what if put string buffer onto stack of caller for drm_get_connector_name and drm_get_encoder_name?
> It will be auto-removed and there is will be the patch about adding new param for these functions.
> (yes the patch will be big and awful to read)
>
> ======================= NX bit violation crash ========================================
> Mar 20 21:53:29 haswell01 kernel: [13447.100849] Console: switching to colour dummy device 80x25
> Mar 20 21:53:29 haswell01 kernel: [13447.100950] drm_kms_helper: drm: unregistered panic notifier
> Mar 20 21:53:29 haswell01 kernel: [13447.117785] waiting module removal not supported: please upgrade
> Mar 20 21:53:29 haswell01 kernel: [13447.117880] [drm] Module unloaded
> Mar 20 21:53:29 haswell01 kernel: [13447.118244] waiting module removal not supported: please upgrade
> Mar 20 21:53:29 haswell01 kernel: [13447.118360] waiting module removal not supported: please upgradewaiting module removal not supported: please upgrade
> Mar 20 21:53:39 haswell01 kernel: [13447.118590] waiting module removal not supported: please upgrade<6>[13457.263808] [drm] Initialized drm 1.1.0 20060810
> Mar 20 21:53:39 haswell01 kernel: [13457.333992] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> Mar 20 21:53:39 haswell01 kernel: [13457.333996] BUG: unable to handle kernel paging request at ffff8803eb27fcb0
> Mar 20 21:53:39 haswell01 kernel: [13457.333998] IP: [<ffff8803eb27fcb0>] 0xffff8803eb27fcb0
> Mar 20 21:53:39 haswell01 kernel: [13457.334001] PGD 2fd9067 PUD 3e68c3063 PMD 404902063 PTE 80000003eb27f163
> Mar 20 21:53:39 haswell01 kernel: [13457.334004] Oops: 0011 [#1] SMP
> Mar 20 21:53:39 haswell01 kernel: [13457.334006] Modules linked in: i915(+) video drm_kms_helper drm i2c_algo_bit snd_hda_codec_hdmi rfcomm bnep bluetooth nls_iso8859_1 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_realtek kvm snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq hid_generic crct10dif_pclmul snd_seq_device usbhid crc32_pclmul snd_timer ghash_clmulni_intel snd aesni_intel psmouse hid mei_me aes_x86_64 lrw ppdev gf128mul mei glue_helper parport_pc ablk_helper lp cryptd soundcore parport serio_raw lpc_ich mac_hid e1000e ahci ptp libahci pps_core [last unloaded: video]
> Mar 20 21:53:39 haswell01 kernel: [13457.334031] CPU: 0 PID: 4974 Comm: modprobe Not tainted 3.13.6 #10
> Mar 20 21:53:39 haswell01 kernel: [13457.334032] Hardware name:                  /DQ87PG, BIOS PGQ8710H.86A.0144.2014.0113.1604 01/13/2014
> Mar 20 21:53:39 haswell01 kernel: [13457.334034] task: ffff88002e6c5fc0 ti: ffff880406394000 task.ti: ffff880406394000
> Mar 20 21:53:39 haswell01 kernel: [13457.334035] RIP: 0010:[<ffff8803eb27fcb0>]  [<ffff8803eb27fcb0>] 0xffff8803eb27fcb0
> Mar 20 21:53:39 haswell01 kernel: [13457.334037] RSP: 0018:ffff880406395af0  EFLAGS: 00010282
> Mar 20 21:53:39 haswell01 kernel: [13457.334038] RAX: ffff8803eb27f4b0 RBX: ffff8803f6016000 RCX: ffffffffa0238630
> Mar 20 21:53:39 haswell01 kernel: [13457.334039] RDX: ffff8803eb27fcb0 RSI: ffffffffa03ba3c4 RDI: ffff8803f6016000
> Mar 20 21:53:39 haswell01 kernel: [13457.334040] RBP: ffff880406395b10 R08: 0000000000000000 R09: ffff88041ea172e0
> Mar 20 21:53:39 haswell01 kernel: [13457.334042] R10: ffffea00101d7200 R11: 0000000000000000 R12: 0000000000000000
> Mar 20 21:53:39 haswell01 kernel: [13457.334043] R13: ffffffffa03ba3c4 R14: ffffffffa03dd100 R15: 0000000000000000
> Mar 20 21:53:39 haswell01 kernel: [13457.334044] FS:  00007f3f392d0740(0000) GS:ffff88041ea00000(0000) knlGS:0000000000000000
> Mar 20 21:53:39 haswell01 kernel: [13457.334046] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Mar 20 21:53:39 haswell01 kernel: [13457.334047] CR2: ffff8803eb27fcb0 CR3: 00000003e7358000 CR4: 00000000001407f0
> Mar 20 21:53:39 haswell01 kernel: [13457.334048] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Mar 20 21:53:39 haswell01 kernel: [13457.334050] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Mar 20 21:53:39 haswell01 kernel: [13457.334051] Stack:
> Mar 20 21:53:39 haswell01 kernel: [13457.334052]  ffffffffa0213cb2 ffff8803f6016000 ffff880408199000 ffff880408199098
> Mar 20 21:53:39 haswell01 kernel: [13457.334054]  ffff880406395b60 ffffffffa0215b92 0000000000000004 ffff880408199140
> Mar 20 21:53:39 haswell01 kernel: [13457.334057]  ffffffffa03b9920 ffff880408199000 0000000000000000 ffffffffa03dd000
> Mar 20 21:53:39 haswell01 kernel: [13457.334059] Call Trace:
> Mar 20 21:53:39 haswell01 kernel: [13457.334067]  [<ffffffffa0213cb2>] ? drm_dev_register+0xa2/0x1e0 [drm]
> Mar 20 21:53:39 haswell01 kernel: [13457.334073]  [<ffffffffa0215b92>] drm_get_pci_dev+0x92/0x140 [drm]
> Mar 20 21:53:39 haswell01 kernel: [13457.334082]  [<ffffffffa033d67c>] i915_pci_probe+0x3c/0x90 [i915]
> Mar 20 21:53:39 haswell01 kernel: [13457.334086]  [<ffffffff813982b5>] local_pci_probe+0x45/0xa0
> Mar 20 21:53:39 haswell01 kernel: [13457.334088]  [<ffffffff81399555>] ? pci_match_device+0xc5/0xd0
> Mar 20 21:53:39 haswell01 kernel: [13457.334090]  [<ffffffff81399679>] pci_device_probe+0xd9/0x130
> Mar 20 21:53:39 haswell01 kernel: [13457.334093]  [<ffffffff81484795>] driver_probe_device+0x125/0x3b0
> Mar 20 21:53:39 haswell01 kernel: [13457.334095]  [<ffffffff81484af3>] __driver_attach+0x93/0xa0
> Mar 20 21:53:39 haswell01 kernel: [13457.334098]  [<ffffffff81484a60>] ? __device_attach+0x40/0x40
> Mar 20 21:53:39 haswell01 kernel: [13457.334100]  [<ffffffff81482703>] bus_for_each_dev+0x63/0xa0
> Mar 20 21:53:39 haswell01 kernel: [13457.334102]  [<ffffffff8148414e>] driver_attach+0x1e/0x20
> Mar 20 21:53:39 haswell01 kernel: [13457.334104]  [<ffffffff81483d30>] bus_add_driver+0x180/0x250
> Mar 20 21:53:39 haswell01 kernel: [13457.334106]  [<ffffffffa03fe000>] ? 0xffffffffa03fdfff
> Mar 20 21:53:39 haswell01 kernel: [13457.334109]  [<ffffffff81485174>] driver_register+0x64/0xf0
> Mar 20 21:53:39 haswell01 kernel: [13457.334110]  [<ffffffffa03fe000>] ? 0xffffffffa03fdfff
> Mar 20 21:53:39 haswell01 kernel: [13457.334113]  [<ffffffff81397c4c>] __pci_register_driver+0x4c/0x50
> Mar 20 21:53:39 haswell01 kernel: [13457.334117]  [<ffffffffa0215d5a>] drm_pci_init+0x11a/0x130 [drm]
> Mar 20 21:53:39 haswell01 kernel: [13457.334119]  [<ffffffffa03fe000>] ? 0xffffffffa03fdfff
> Mar 20 21:53:39 haswell01 kernel: [13457.334127]  [<ffffffffa03fe066>] i915_init+0x66/0x68 [i915]
> Mar 20 21:53:39 haswell01 kernel: [13457.334130]  [<ffffffff8100214a>] do_one_initcall+0xfa/0x1b0
> Mar 20 21:53:39 haswell01 kernel: [13457.334133]  [<ffffffff810589e3>] ? set_memory_nx+0x43/0x50
> Mar 20 21:53:39 haswell01 kernel: [13457.334137]  [<ffffffff810e0630>] load_module+0x2050/0x26f0
> Mar 20 21:53:39 haswell01 kernel: [13457.334139]  [<ffffffff810dbfa0>] ? store_uevent+0x40/0x40
> Mar 20 21:53:39 haswell01 kernel: [13457.334141]  [<ffffffff810e0e46>] SyS_finit_module+0x86/0xb0
> Mar 20 21:53:39 haswell01 kernel: [13457.334144]  [<ffffffff8171823f>] tracesys+0xe1/0xe6
> Mar 20 21:53:39 haswell01 kernel: [13457.334145] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 <f0> b1 ca 81 ff ff ff ff b0 f4 27 eb 03 88 ff ff ff ff ff 7f 00
> Mar 20 21:53:39 haswell01 kernel: [13457.334164] RIP  [<ffff8803eb27fcb0>] 0xffff8803eb27fcb0
> Mar 20 21:53:39 haswell01 kernel: [13457.334166]  RSP <ffff880406395af0>
> Mar 20 21:53:39 haswell01 kernel: [13457.334167] CR2: ffff8803eb27fcb0
> Mar 20 21:53:39 haswell01 kernel: [13457.334168] ---[ end trace e2598b1fc83a65fd ]---
>
>
> ________________________________________
> From: Daniel Vetter <daniel.vetter@xxxxxxxx> on behalf of Daniel Vetter <daniel@xxxxxxxx>
> Sent: Tuesday, March 25, 2014 12:31 AM
> To: Dmitry Malkin
> Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [DRM] drm_get_connector_name internal static string buffer causes a race
>
> On Mon, Mar 24, 2014 at 12:06:21PM +0000, Dmitry Malkin wrote:
> >
> > Hello guys,
> >
> > I've been playing with reloading intel gfx driver (i915) in a cycle, for a while,
> > and at some point I've found a non-deterministic kernel crash with a highly-variable
> > iteration dependency -- 2 to 200 driver reload iterations.
> >
> > The apparent race is over the shared internal string buffer in drm_get_connector_name().
> > It is mostly harmless, due to its results being mostly used for log output, but in at least one
> > case  -- drm_sysfs_connector_add() -- this leads to a more critical error.
> >
> > Race scenario:
> >
> > - drm_sysfs_connector_add()
> >    - drm_get_connector_name()
> > vs.
> > - anything that generates log messages involving DRM connectors
> >    - drm_get_connector_name()
> >
> >  (and many other from drm_crtc.c) shares with caller const char* to internal static char buffer.
> > If something call it from other thread, while main thread strore and use returned pointer it may overwrite connector name.
> >
> > Here are we go: registering HDMI connecter  (drm_sysfs_connector_add store and use pointer from drm_get_connector_name)
> > and the same time got VGA connector name down through the stack. (the second thread is upowerd who watch continuously sysfs)
>
> Yeah, in my recent kerneldoc series I've noticed this too and added FIXME
> comments. There's a lot more than just those you've pointed out. The
> problem is that fixing these will be an awful lot of work since you need
> to add proper cleanup code (calling kfree()) to all the required places.
>
> So a full subsystem wide code audit for every single use-site of these.
> -Daniel

--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux