PROBLEM: Possible bug in AMDGPU DC code?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I have not posted to LKML before, so I apologize if this is a cumbersome area to place this message.

I purchased the recently-released HP envy x360 laptop which has a Ryzen 2500U APU with a Vega 10 GPU. After setting up Slackware on the laptop, I compiled kernel 4.15-rc2 while enabling the AMDGPU DC code to try and  test out the current functionality. The result is that most of the time, the boot process seems to get hung at

"Switching to amdgpudrmfb from EFI VGA"

Very rarely the boot will succeed and everything seems to go smoothly. Adding "nomodeset" to the kernel parameters causes the boot to always succeed, at the cost of course of disabling amdgpu from working correctly,  since it requires modesetting.

I have also tried the same process within Ubuntu 17.10 and also using kernels 4.15-rc3 and 4.15-rc4 with the same results. The only way I was able to capture system output which seemed relevant was by blacklisting amdgpu  and then modprobing it once in my desktop environment, which promptly caused my system to freeze, but seemed to reveal some information about an MCE hardware error. Unfortunately it seems mcelog doesn't support Ryzen yet, so I can't retrieve any useful information  that way. However, /var/log/syslog did seem to cough up a little bit more, specifically:


Dec 19 04:23:44 darkstar kernel: [ 1139.605187] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:153 vm_id:0 pas_id:0)
Dec 19 04:23:44 darkstar kernel: [ 1139.605191] amdgpu 0000:03:00.0:   at page 0x0000000000000000 from 18
Dec 19 04:23:44 darkstar kernel: [ 1139.605193] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Dec 19 04:23:44 darkstar kernel: [ 1139.605206] [Hardware Error]: Deferred error, no action required.
Dec 19 04:23:44 darkstar kernel: [ 1139.605212] [Hardware Error]: CPU:0 (17:11:0) MC20_STATUS[-|-|MiscV|-|AddrV|Deferred|-|SyndV|-|UECC]: 0x9c2030000001085b
Dec 19 04:23:44 darkstar kernel: [ 1139.605218] [Hardware Error]: Error Addr: 0x00007ffcffffff00
Dec 19 04:23:44 darkstar kernel: [ 1139.605220] [Hardware Error]: IPID: 0x0000002e00000000, Syndrome: 0x000000005b240205
Dec 19 04:23:44 darkstar kernel: [ 1139.605224] [Hardware Error]: Coherent Slave Extended Error Code: 1
Dec 19 04:23:44 darkstar kernel: [ 1139.605225] [Hardware Error]: Coherent Slave Error: Address violation.
Dec 19 04:23:44 darkstar kernel: [ 1139.605228] [Hardware Error]: cache level: L3/GEN, mem/io: IO, mem-tx: IRD, part-proc: SRC (no timeout)

 which at least appear to be related.

As I have not heard much else in the way of issues using the AMDGPU DC code, I believe that this is a problem localized to this particular laptop/BIOS/hardware configuration. Using the modprobe method, I have attached  everything that I have been able to capture up to the system hang which I believe is relevant or which has been suggested by the bug reporting FAQ; please let me know if there is more information that would be useful.
    

Attachment: cpuinfo
Description: cpuinfo

Attachment: dmesg
Description: dmesg

Attachment: iomem
Description: iomem

Attachment: ioports
Description: ioports

Attachment: lspci
Description: lspci

Attachment: messages
Description: messages

Attachment: modules
Description: modules

Attachment: scsi
Description: scsi

Attachment: syslog
Description: syslog

Attachment: ver_linux
Description: ver_linux


[Index of Archives]

  Powered by Linux

[Older Kernel Discussion]     [Yosemite National Park Forum]     [Gimp]     [Stuff]     [Index of Other Archives]