Re: Question about: EDAC amd4 MC0: Failed to translate InputAddr to csrow for address 0x27b028ff0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]<

 



On Mon, Aug 06, 2012 at 08:57:38PM +0000, Jiang Wang wrote:
> Hi there,
> 
> I am testing amd64 error injection with EDAC driver from SL6.2 on a machine which have two AMD Opteron(tm) Processor 2378.  After I injected two bits errors via sysfs nodes, I got following message:
> 
> [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 4: be002000ed080813
> [Hardware Error]: TSC 171bba7b052 ADDR 27b028ff0 MISC c008000201000000
> [Hardware Error]: PROCESSOR 2:100f42 TIME 1343090754 SOCKET 0 APIC 0
> [Hardware Error]: MC4_STATUS[-|UE|MiscV|PCC|AddrV|UECC]: 0xbe002000ed080813
> [Hardware Error]: Northbridge Error (node 0): DRAM ECC error detected on the
> NB.
> EDAC amd64 MC0: Failed to translate InputAddr to csrow for address 0x27b028ff0

Btw, this is an uncorrectable error

MC4_STATUS[Val|UC|EN|MiscV|AddrV|PCC|UECC|EEC: DRAM ECC (0x08) (synd=0xed00)|ET: BUS(pp:SRC;t:NOTIMOUT;r4:RD;ii:MEM;ll:LG)]: 0xbe002000ed080813

because you're injecting 0x88, i.e. two bits, each in a different
symbol. Which could mean that you have x4 DIMMs.

Also, that physical address 0x27b028ff0 is around 10.5 Gb-ish, do you
have that much memory on the system?

Also, can you send full dmesg pls?

> ---------------------------------------------------------------------
> 
> EDAC amd64 MC0: ERROR_ADDRESS (0x27b028ff0) NOT mapped to CS
> EDAC MC0: UE - no information available: amd64_edac
> [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: SRC
> (no timeout)
> [Hardware Error]: Machine check: Processor context corrupt
> Kernel panic - not syncing: Fatal machine check on current CPU
> 
> My questions is about this line: Failed to translate InputAddr to csrow for address 0x27b028ff0.
> 
> Is this an expected behavior or is it a bug? 
> Btw: sometimes, the address can be translate to csrow.
> 
> The commands I used to inject errors:
> 
> cd /sys/devices/system/edac/mc/mc0
> echo 3 > inject_section
> echo 7 > inject_word
> echo 0x88 > inject_ecc_vector
> echo 1 > inject_read
> echo 1 > inject_write

FWIW, I did the same on a F10h box here with 3.6-rc1 and it looks ok:

[62312.996927] [Hardware Error]: CPU:0  MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc11400088080a13
[62313.006006] [Hardware Error]:        MC4_ADDR: 0x00000004252dfe70
[62313.011832] [Hardware Error]: Northbridge Error (node 0): DRAM ECC error detected on the NB.
[62313.011837] EDAC amd64 MC0: CE ERROR_ADDRESS= 0x4252dfe70
[62313.011843] EDAC DEBUG: f1x_match_to_this_node: (range 0) SystemAddr= 0x4252dfe70 Limit=0x437ffffff
[62313.011846] EDAC DEBUG: f1x_match_to_this_node:    Normalized DCT addr: 0x1f696fe40
[62313.011849] EDAC DEBUG: f1x_lookup_addr_in_dct: input addr: 0x1f696fe40, DCT: 0
[62313.011853] EDAC DEBUG: f1x_lookup_addr_in_dct:     CSROW=0 CSBase=0x0 CSMask=0xffffffe1fff9ffff
[62313.011857] EDAC DEBUG: f1x_lookup_addr_in_dct:     (InputAddr & ~CSMask)=0x60000 (CSBase & ~CSMask)=0x0
[62313.011861] EDAC DEBUG: f1x_lookup_addr_in_dct:     CSROW=1 CSBase=0x20000 CSMask=0xffffffe1fff9ffff
[62313.011864] EDAC DEBUG: f1x_lookup_addr_in_dct:     (InputAddr & ~CSMask)=0x60000 (CSBase & ~CSMask)=0x20000
[62313.011868] EDAC DEBUG: f1x_lookup_addr_in_dct:     CSROW=2 CSBase=0x40000 CSMask=0xffffffe1fff9ffff
[62313.011871] EDAC DEBUG: f1x_lookup_addr_in_dct:     (InputAddr & ~CSMask)=0x60000 (CSBase & ~CSMask)=0x40000
[62313.011875] EDAC DEBUG: f1x_lookup_addr_in_dct:     CSROW=3 CSBase=0x60000 CSMask=0xffffffe1fff9ffff
[62313.011878] EDAC DEBUG: f1x_lookup_addr_in_dct:     (InputAddr & ~CSMask)=0x60000 (CSBase & ~CSMask)=0x60000
[62313.011881] EDAC DEBUG: f1x_lookup_addr_in_dct:  MATCH csrow=3
[62313.011894] EDAC MC0: 1 CE  on unknown memory (csrow:3 channel:0 page:0x4252df offset:0xe70 grain:0 syndrome:0x8822)
[62313.011904] [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)

Of course, the address is different.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
--
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux