USB instabilities with Atheros AR9344

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I am currently working on an embedded project based on the Atheros
AR9344 SoC. As a prototype device, we are using the TP-Link TL-WDR4300
router (http://wiki.openwrt.org/toh/tp-link/tl-wdr4300) and latest
OpenWRT trunk. The kernel is 3.10.18.

We have over the last couple of weeks experienced a USB problem that
we have not been able to solve. The USB hub works fine most of the
time, but when event X happens, USB becomes unusable for extended
periods of time. We have to disable/enable the power on the USB port
(using GPIO) and then wait until a timeout expires/queue is flushed.

The devices we have been able to trigger event X with is different
3G/LTE modems. We have not been able to figure out exactly what
triggers the event, but it happens when we move into areas with poor
or no coverage and then move back into coverage. We see the error both
with QMI-modems (qmi_wwan driver), AT-modems (option_serial driver)
and WebUI-modems (cdc_ether driver). When looking in dmesg after this
event has happened, the following messages appear based on the modem
type:

QMI:
Thu Nov 21 09:44:53 2013 kern.err kernel: [  490.600000] qmi_wwan
1-1.1.2:1.4: nonzero urb status received: -71
Thu Nov 21 09:44:53 2013 kern.err kernel: [  490.600000] qmi_wwan
1-1.1.2:1.4: wdm_int_callback - 0 bytes

Serial:
[62979.280000] option1 ttyUSB7: option_instat_callback: error -71

WebUI:
[ 1192.680000] hub 1-1:1.0: cannot reset port 1 (err = -71)
[ 1192.690000] hub 1-1:1.0: Cannot enable port 1.  Maybe the USB cable is bad?

The common denominator seems to be the -71 error code, which is a
generic Protocol Error if I have understood correctly. When I search
for this error code, it seems that most problems have been due to
power. However, this seems not be the issue here. The modems are
connected to an active hub and event X happens with only a single
modem connected, so it seems unlikely that it is power.

In order to rule out the TP-Link router, we have also tested with
another router based on the same SoC (Netgear WNDR4300). The same
issue is seen. We also made some tests on a device with a different
SoC (Raspberry Pi, BCM2835) and do not see this issue.

We have mostly focused on the QMI modems and when using dynamic
debugging, dmesg also contains these errors (repeated many times):
[ 1911.200000] ehci-platform ehci-platform: detected XactErr len 0/1514 retry 26
[ 1911.200000] ehci-platform ehci-platform: detected XactErr len 0/64 retry 14

Each packet is, as expected, retried 32 times. The data we sent when
these messages appeared was normal TCP traffic, which explains the
packet sizes. If we leave the router alone long enough, it is able to
restart the modems (they disconnect and then connect). However, this
can take many minutes (I guess the packet queue has to be flushed?),
and while this happens the USB hub is blocked (no traffic can pass
through it).

When running usbmon, we see the following around the time of the crash
(with QMI modem):

86abea80 1428742032 S Bi:1:115:7 -150 1514 <
86abeb00 1428801536 C Bi:1:115:7 0 226 = 024b322c fd930250 f3000000
08004500 00d4bba7 4000fd06 08728027 245d2e0f
86abeb00 1428801554 S Bi:1:115:7 -150 1514 <
84895c00 1428802518 S Bo:1:115:5 -150 66 = 0250f300 0000024b 322cfd93
08004500 00349c42 40003f06 e6772e0f e6768027
84895c00 1428802660 C Bo:1:115:5 0 66 >
86abeb80 1428982112 C Bi:1:115:7 0 1354 = 024b322c fd930250 f3000000
08004500 053cbbaa 4000fd06 04078027 245d2e0f
86abeb80 1428982141 S Bi:1:115:7 -150 1514 <
86abec00 1429021624 C Bi:1:115:7 0 226 = 024b322c fd930250 f3000000
08004500 00d4bbab 4000fd06 086e8027 245d2e0f
86abec00 1429021653 S Bi:1:115:7 -150 1514 <
84895480 1429022660 S Bo:1:115:5 -150 66 = 0250f300 0000024b 322cfd93
08004500 00349c43 40003f06 e6762e0f e6768027
84895480 1429022746 C Bo:1:115:5 0 66 >
86b1dc00 1430690752 C Ii:1:115:6 0:16 8 = a1010000 04000000
86b03d80 1430690765 S Ci:1:115:0 s a1 01 0000 0004 1000 4096 <
86b1dc00 1430690787 S Ii:1:115:6 -150:16 64 <
86b03d80 1430691369 C Ci:1:115:0 0 39 = 01260080 03010400 0024001a
001e0400 9f0c0000 1d0200db 0e110200 01050106
86abec80 1430896349 C Bi:1:115:7 -71 0
84895800 1431014639 S Bi:1:115:7 -150 1514 <
86abed00 1431066817 C Bi:1:115:7 -71 0
84895480 1431184603 S Bi:1:115:7 -150 1514 <
86abed80 1431307124 C Bi:1:115:7 -71 0
86b03c00 1431330567 S Co:1:115:0 s 21 00 0000 0004 0012 18 = 01110000
03010000 01200005 00100200 ff00
86b03c00 1431331498 C Co:1:115:0 0 18 >
86b1dc00 1431332988 C Ii:1:115:6 0:16 8 = a1010000 04000000
86b03d80 1431332996 S Ci:1:115:0 s a1 01 0000 0004 1000 4096 <
86b1dc00 1431333012 S Ii:1:115:6 -150:16 64 <
86b03d80 1431333484 C Ci:1:115:0 0 58 = 01390080 03010200 0120002d
00020400 00000000 01020092 05110400 01006e05
86b03c00 1431346879 S Co:1:115:0 s 21 00 0000 0004 000d 13 = 010c0000
03010000 004d0000 00
86b03c00 1431347879 C Co:1:115:0 0 13 >
86b1dc00 1431348994 C Ii:1:115:6 0:16 8 = a1010000 04000000
86b03d80 1431349002 S Ci:1:115:0 s a1 01 0000 0004 1000 4096 <
86b1dc00 1431349021 S Ii:1:115:6 -150:16 64 <
86b03d80 1431349490 C Ci:1:115:0 0 98 = 01610080 03010200 004d0055
00020400 00000000 12030000 00001303 00020200
86b03c00 1431363692 S Co:1:115:0 s 21 00 0000 0004 000d 13 = 010c0000
03010000 00250000 00
86b03c00 1431367129 C Co:1:115:0 0 13 >
86b1dc00 1431369000 C Ii:1:115:6 0:16 8 = a1010000 04000000
86b03d80 1431369009 S Ci:1:115:0 s a1 01 0000 0004 1000 4096 <
86b1dc00 1431369029 S Ii:1:115:6 -150:16 64 <
86b03d80 1431369622 C Ci:1:115:0 0 34 = 01210080 03010200 00250015
00020400 00000000 010b00f2 00020006 4e657443
84895380 1431424638 S Bi:1:115:7 -150 1514 <
86abee00 1431533084 C Bi:1:115:7 -71 0
84895f80 1431644606 S Bi:1:115:7 -150 1514 <
86abee80 1431773424 C Bi:1:115:7 -71 0
86abef00 1431859709 C Bi:1:115:7 -71 0
84895e80 1431884647 S Bi:1:115:7 -150 1514 <
84895d80 1431884669 S Bi:1:115:7 -150 1514 <
86abef80 1431891856 C Bi:1:115:7 -71 0
86b93e00 1431923867 C Bi:1:115:7 -71 0
86b1de00 1431955895 C Bi:1:115:7 -71 0
86b1d800 1431986895 C Bi:1:115:7 -71 0
84895000 1432004649 S Bi:1:115:7 -150 1514 <
84895f00 1432004672 S Bi:1:115:7 -150 1514 <
84895100 1432004690 S Bi:1:115:7 -150 1514 <
84895980 1432004699 S Bi:1:115:7 -150 1514 <

My knowledge about USB is very limited, so I am not able to make much
sense of these messages. I have put the full log here:
https://gist.github.com/kristrev/7705450.

My question is, has anyone experienced anything similar and know how
to solve this problem, or have any ideas on how to proceed? Since the
error seems to be independent of drivers, I guess it points to this
being hardware related. Would for example reducing QH_XACTERR_MAX be a
possible (temporary) solution, or are there any ways to flush this
queue once we see the error? The most critical part for us is that USB
is blocked for such extended periods of time.

Thanks in advance for any help,
Kristian
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux