Re: pbuf_alloc failures with LwIP
|[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]|
Hi Elad,Hmm, I've had a quick look at the pbuf management in eCos 3.0. It's quite different from the CVS version, so I'm not that familiar with it.
Nonetheless, I'm surprised by the PBUF statistics: PBUF - "each pbuf is 1024 bytes" avail: 30 used: 1 max: 11 err: 2 alloc_locked: 0 refresh_locked: 0There's something wrong here. Considering that "alloc_locked = 0", the only way for "err" to be incremented is if you run out of pbufs. However, the sign that you have run out of pbufs is that "max" equals "avail". Yet, in your case, max = 11, while avail = 30. So you didn't run out of pbufs, you only used 11 out of 30.
Digging a bit more, it appears that "err" in increased when pbuf_pool_alloc() returns NULL. This happens when the linked-list of available pbufs is empty.
So, how come the linked-list of available pbufs is empty when max = 11? In my opinion, the linked-list of available pbufs is corrupt or truncated.
Are you sure that you're respecting the thread-safe requirements of lwIP? Are you using multiple threads? If so, make sure that the SYS_ARCH_PROTECT macro (in lwip/sys.h) is defined to do something useful, rather than being an empty definition.
Regards, Michael. On 14/06/2012 06:43, Elad Yosef wrote:
Hi Michael, Thanks for the detailed reply. I think I have exactly the same problem that you have - the networking stops working. I got the LwIP stats after the networking stopped working, see LINK xmit: 0 rexmit: 0 recv: 0 fw: 0 drop: 0 chkerr: 0 lenerr: 0 memerr: 0 rterr: 0 proterr: 0 opterr: 0 err: 0 cachehit: 0 IP_FRAG xmit: 0 rexmit: 0 recv: 0 fw: 0 drop: 0 chkerr: 0 lenerr: 0 memerr: 0 rterr: 0 proterr: 0 opterr: 0 err: 0 cachehit: 0 IP xmit: 17643 rexmit: 0 recv: 63100 fw: 0 drop: 0 chkerr: 0 lenerr: 0 memerr: 0 rterr: 0 proterr: 0 opterr: 0 err: 0 cachehit: 0 ICMP xmit: 2775 rexmit: 0 recv: 2950 fw: 0 drop: 175 chkerr: 0 lenerr: 0 memerr: 0 rterr: 0 proterr: 175 opterr: 0 err: 0 cachehit: 0 UDP xmit: 4714 rexmit: 0 recv: 53209 fw: 0 drop: 0 chkerr: 0 lenerr: 0 memerr: 0 rterr: 0 proterr: 0 opterr: 0 err: 0 cachehit: 0 TCP xmit: 6715 rexmit: 0 recv: 6941 fw: 0 drop: 0 chkerr: 0 lenerr: 0 memerr: 2705 rterr: 0 proterr: 0 opterr: 0 err: 0 cachehit: 0 PBUF - "each pbuf is 1024 bytes" avail: 30 used: 1 max: 11 err: 2 alloc_locked: 0 refresh_locked: 0 MEM HEAP avail: 1024 used: 0 max: 720 err: 0 MEM PBUF avail: 8 used: 0 max: 2 err: 0 MEM RAW_PCB avail: 4 used: 0 max: 0 err: 0 MEM UDP_PCB avail: 3 used: 3 max: 3 err: 0 MEM TCP_PCB avail: 16 used: 0 max: 8 err: 0 MEM TCP_PCB_LISTEN avail: 1 used: 1 max: 1 err: 0 MEM TCP_SEG avail: 6 used: 0 max: 4 err: 0 MEM NETBUF avail: 10 used: 0 max: 6 err: 0 MEM NETCONN avail: 12 used: 4 max: 7 err: 0 MEM API_MSG avail: 6 used: 0 max: 2 err: 0 MEM TCP_MSG avail: 12 used: 0 max: 7 err: 0 MEM TIMEOUT avail: 4 used: 2 max: 3 err: 0 I would appreciate if can take a look Elad On Wed, Jun 13, 2012 at 6:47 PM, Michael O'Dowd <michael.odowd@xxxxxxxxxxx> wrote:Hi Elad, I ran into a similar problem recently. I'm using a recent CVS checkout rather than 3.0. Also, I'm probably not using the same ethernet HW, so I don't know how well my reply corresponds to your case. The eth_drv.c file is the glue between lwIP and the underlying ethernet driver, so the issue that you are encountering may be specific to the driver. In my case, when under stress, eth_drv.c generates the error message: "cannot allocate pbuf to receive packet". Soon after that, the ethernet driver stops receiving traffic permanently, but does not crash. In your case, if I understand correctly, your system crashes. The issue is that when eth_drv_recv() fails to allocate a pbuf, it returns without calling the ethernet driver recv() function: (sc->funs->recv)(). In my case, the driver requires that it's recv() function be called, in order to complete the processing of the packet reception and to free up the receive buffer(s). Failing to call it, apparently causes the receive path to cease functioning (I'm still investigating the details). In your case, I gather that it crashes the system. Note: I'm running on an NXP 1788 (Cortex-M3), using the "devs/arm/lpc2xxx/current/src/if_lpc2xxx.c" ethernet driver. There are two aspects to this problem: 1) In my opinion, there is a bug in eth_drv_recv(). If there are no pbufs available, then it should at least cause the received packet to be discarded. Otherwise, the system may fail whenever there is a minor burst of traffic on the network. It doesn't take much: there are only 16 pbufs available by default. Whether or not the system fails, depends on how the ethernet driver reacts to the failure to call it's recv() function. I hope to fix this on my platform in the near future. 2) You should also keep an eye on your pbuf usage, just to make sure that you don't have a pbuf memroy leak. You could also try to allocate more pbufs, if you have the available memory. If you are using the default lwip configuration, the pbuf memory allocation is handled by memp.[hc]. It has a fixed number of pbufs available. The default is 16 pbufs, and can be changed in the configtool under: [lwIP networking stack/Memory options/Number of memp struct pbufs]. Alternatively, if you have lots of memory, you could enable the checkbox: [lwIP networking stack/Memory options/Use malloc for pool allocations]. This bypasses the memp pools and their static limitations. Though this will make it harder to spot a pbuf memory leak. I haven't tried this personally. Finally, (when using memp) the pbuf usage can be monitored with lwip/stats.h. If you have access to a serial port, try calling stats_display(). Here is a snippet of the pbuf related output:MEM PBUF_POOL avail: 16 used: 0 max: 3 err: 0The "err" counter increases when pbuf_alloc() fails. Hope that helps, Regards, Michael O'Dowd Kuantic SAS On 12/06/2012 22:40, Elad Yosef wrote:Hi all, I'm using LwIP stack on my target and experiencing crashes under stress. function eth_drv_recv) from ../io/eth/v3_0/ser/lwip/eth_drv.c calls pbuf_alloc() and this allocation fails. Is this result of some bad configuration? Thanks Elad
-- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss