Re: More fixes for kmem on slabs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Original Message -----
> More testing revealed a machine in our stable that either failed to
> initialize kmem:
> 
> please wait... (gathering kmem slab cache data)
> crash-6.0.3: page excluded: kernel virtual address: ffff8801263d6000
>  type: "kmem_cache buffer"
> 
> crash-6.0.3: unable to initialize kmem slab cache subsystem
> 
> Or succeeded on initialize and then failed on a kmem -s command:
> 
> crash-6.0.3> kmem -s
> CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
> Segmentation fault
> 
> 
> The problem is that the array struct at the end of kmem_cache remains declared as
> 32 elements, but for all dynamically allocated copies, is actually trimmed down
> to nr_cpu_ids in length.
> 
> crash-6.0.3.best> struct kmem_cache
> struct kmem_cache {
>     unsigned int batchcount;
> ...
> 
>     struct list_head next;
>     struct kmem_list3 **nodelists;
>     struct array_cache *array[32];
> }
> SIZE: 368
> 
> 
> On my normal play machine, nr_cpu_ids = 32 and actual cpus = 16.
> 
> On the failing machine, nr_cpus_ids and actual cpus are both 2.
> 
> Two problems occur:
> 
> 1)  max_cpudata_limit traverses the array until it finds a 0x0 or
> reaches the real size.  On the 2-cpu system, the "third" element in the
> array belonged elsewhere, was non-zero, and pointed to data that caused
> the apparent limit to be 0xffffffffffff8801, which didn't work well as
> a length in a memcopy.

But your patch does this:

@@ -8117,8 +8135,9 @@ kmem_cache_s_array_nodes:
             "array cache array", RETURN_ON_ERROR))
                goto bail_out;

-       for (i = max_limit = 0; (i < ARRAY_LENGTH(kmem_cache_s_array)) &&
-            cpudata[i]; i++) {
+       for (i = max_limit = 0; (i < kmem_cache_nr_cpu)
+                       && (i < ARRAY_LENGTH(kmem_cache_s_array))
+                       && cpudata[i]; i++) {
                 if (!readmem(cpudata[i]+OFFSET(array_cache_limit),
                     KVADDR, &limit, sizeof(int),
                     "array cache limit", RETURN_ON_ERROR))

On "old" slab systems, your new "kmem_cache_nr_cpu" variable remains at
its initialized value of zero, and the loop never gets entered.  So I don't 
think you wanted to keep the (i < kmem_cache_nr_cpu) there, right?

> 2) kmem_cache structs can be allocated near enough to the edge of a page
> that the old incorrect length crosses the page boundary, even though the
> real smaller structure fits in the page.  That caused a readmem of the
> structure to cross into a coincidentally missing page in the dump.

Right -- that was the genesis of the kmem_cache_downsize() function.
 
> This patch fixes both of those (after wrestling ARRAY_LENGTH to the
> ground), but *does not* fix the similar page crossing problem when I try
> to use a "struct kmem_cache" command on the particular structure at the
> end of the page.

Yeah, damn, I don't know what can be done for that, aside from some
horrific kludge to gdb_readmem_callback() to return successfully even 
if the readmem() failed.

Dave

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility


[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux