RevPi failed to get modem status

Topics about the Hardware of Revolution Pi
RMeissnerCC
Posts: 58
Joined: 03 Dec 2019, 10:29

RevPi failed to get modem status

Post by RMeissnerCC »

Dear all,
we have a RevPi Core 3 with two DIO and an AIO module, please see in the attached image. Yesterday after running continuously for about four months the IO modules flipped to red and the RevPi just stopped.

In the journalctl I can find the following lines from yesterday, the first line belongs to normal behaviour, just to give you an impression of time:

Jun 02 12:17:01 RevPi100257 CRON[21413]: pam_unix(cron:session): session closed for user root
Jun 02 12:18:33 RevPi100257 kernel: ftdi_sio ttyUSB5: failed to get modem status: -110
Jun 02 12:20:34 RevPi100257 systemd[1]: systemd-udevd.service: Watchdog timeout (limit 3min)!
Jun 02 12:20:34 RevPi100257 systemd[1]: systemd-udevd.service: Killing process 180 (systemd-udevd) with signal SIGABRT.
Jun 02 12:20:34 RevPi100257 systemd[1]: systemd-logind.service: Watchdog timeout (limit 3min)!
Jun 02 12:20:34 RevPi100257 systemd[1]: systemd-logind.service: Killing process 1038 (systemd-logind) with signal SIGABRT.
Jun 02 12:20:34 RevPi100257 kernel: Unable to handle kernel paging request at virtual address dfff0001
Jun 02 12:20:34 RevPi100257 kernel: pgd = b5dec000
Jun 02 12:20:34 RevPi100257 kernel: [dfff0001] *pgd=00000000
Jun 02 12:20:34 RevPi100257 kernel: Internal error: Oops: 5 [#1] PREEMPT SMP ARM
Jun 02 12:20:34 RevPi100257 kernel: Modules linked in: overlay cfg80211 rfkill binfmt_misc snd_bcm2835 snd_pcm snd_timer snd bcm2835_gpiomem ftdi_sio usbserial uio_pdrv_genirq uio sg piControl(O) ti_dac082s085 mcp320x iio_mux mux_gpio mux_core fixed gpio_74x164 spi_bcm2835aux spi_bcm2835 ks8851 eeprom_93cx6 gpio_max3191x industrialio crc8 i2c_dev ip_tables x_tables ipv6
Jun 02 12:20:34 RevPi100257 kernel: CPU: 3 PID: 95 Comm: irq/62-dwc_otg_ Tainted: G O 4.9.76-rt60-v7+ #1
Jun 02 12:20:34 RevPi100257 kernel: Hardware name: BCM2835
Jun 02 12:20:34 RevPi100257 kernel: task: b95c8000 task.stack: b95c6000
Jun 02 12:20:34 RevPi100257 kernel: PC is at __kmalloc+0xf0/0x32c
Jun 02 12:20:34 RevPi100257 kernel: LR is at __kmalloc+0x30/0x32c
Jun 02 12:20:34 RevPi100257 kernel: pc : [<80273638>] lr : [<80273578>] psr: a0000013

It goes on

Jun 02 12:20:34 RevPi100257 systemd[1]: systemd-logind.service: Main process exited, code=killed, status=6/ABRT
Jun 02 12:20:34 RevPi100257 systemd[1]: systemd-logind.service: Unit entered failed state.
Jun 02 12:20:34 RevPi100257 systemd[1]: systemd-logind.service: Failed with result 'watchdog'.
Jun 02 12:20:34 RevPi100257 systemd[1]: systemd-logind.service: Service has no hold-off time, scheduling restart.
Jun 02 12:20:34 RevPi100257 systemd[1]: Stopped Login Service.
Jun 02 12:20:34 RevPi100257 systemd[1]: Starting Login Service...
Jun 02 12:20:34 RevPi100257 systemd[1]: Started Login Service.
Jun 02 12:20:34 RevPi100257 systemd-logind[21427]: New seat seat0.
Jun 02 12:22:04 RevPi100257 systemd[1]: systemd-udevd.service: State 'stop-sigabrt' timed out. Terminating.
Jun 02 12:23:34 RevPi100257 systemd[1]: systemd-udevd.service: State 'stop-sigterm' timed out. Killing.
Jun 02 12:23:34 RevPi100257 systemd[1]: systemd-udevd.service: Killing process 180 (systemd-udevd) with signal SIGKILL.
Jun 02 12:25:04 RevPi100257 systemd[1]: systemd-udevd.service: Processes still around after SIGKILL. Ignoring.
Jun 02 12:26:23 RevPi100257 autossh[1026]: timeout polling to accept read connection
Jun 02 12:26:23 RevPi100257 autossh[1026]: port down, restarting ssh
Jun 02 12:26:35 RevPi100257 systemd[1]: systemd-udevd.service: State 'stop-final-sigterm' timed out. Killing.
Jun 02 12:26:35 RevPi100257 systemd[1]: systemd-udevd.service: Killing process 180 (systemd-udevd) with signal SIGKILL.
Jun 02 12:28:05 RevPi100257 systemd[1]: systemd-udevd.service: Processes still around after final SIGKILL. Entering failed mode.
Jun 02 12:28:05 RevPi100257 systemd[1]: systemd-udevd.service: Unit entered failed state.
Jun 02 12:28:05 RevPi100257 systemd[1]: systemd-udevd.service: Failed with result 'watchdog'.
Jun 02 12:28:05 RevPi100257 systemd[1]: systemd-udevd.service: Service has no hold-off time, scheduling restart.
Jun 02 12:28:05 RevPi100257 systemd[1]: Stopped udev Kernel Device Manager.
Jun 02 12:28:05 RevPi100257 systemd[1]: Starting udev Kernel Device Manager...
Jun 02 12:29:35 RevPi100257 systemd[1]: systemd-udevd.service: Start operation timed out. Terminating.
Jun 02 12:31:05 RevPi100257 systemd[1]: systemd-udevd.service: State 'stop-final-sigterm' timed out. Killing.
Jun 02 12:31:05 RevPi100257 systemd[1]: systemd-udevd.service: Killing process 21525 (systemd) with signal SIGKILL.
Jun 02 12:31:05 RevPi100257 systemd[1]: systemd-udevd.service: Killing process 180 (systemd-udevd) with signal SIGKILL.

And

Jun 02 13:06:26 RevPi100257 systemd[1]: Starting Network Time Synchronization...
Jun 02 13:07:56 RevPi100257 systemd[1]: systemd-timesyncd.service: Start operation timed out. Terminating.
Jun 02 13:09:01 RevPi100257 CRON[21606]: pam_unix(cron:session): session opened for user root by (uid=0)
Jun 02 13:09:01 RevPi100257 CRON[21610]: (root) CMD ( [ -x /usr/lib/php/sessionclean ] && if [ ! -d /run/systemd/system ]; then /usr/lib/php/sessionclean; fi)
Jun 02 13:09:01 RevPi100257 CRON[21606]: pam_unix(cron:session): session closed for user root
Jun 02 13:09:26 RevPi100257 systemd[1]: systemd-timesyncd.service: State 'stop-final-sigterm' timed out. Killing.
Jun 02 13:09:26 RevPi100257 systemd[1]: systemd-timesyncd.service: Killing process 948 (systemd-timesyn) with signal SIGKILL.
Jun 02 13:10:57 RevPi100257 systemd[1]: systemd-timesyncd.service: Processes still around after final SIGKILL. Entering failed mode.
Jun 02 13:10:57 RevPi100257 systemd[1]: Failed to start Network Time Synchronization.
Jun 02 13:10:57 RevPi100257 systemd[1]: systemd-timesyncd.service: Unit entered failed state.
Jun 02 13:10:57 RevPi100257 systemd[1]: systemd-timesyncd.service: Failed with result 'timeout'.
Jun 02 13:10:57 RevPi100257 systemd[1]: systemd-timesyncd.service: Service has no hold-off time, scheduling restart.
Jun 02 13:10:57 RevPi100257 systemd[1]: Stopped Network Time Synchronization.
Jun 02 13:10:57 RevPi100257 systemd[1]: Starting Network Time Synchronization...

Does anybody have ideas what could cause this?
Best, Robert
Attachments
RevPi_Failure_20200603.png
RevPi_Failure_20200603.png (192.35 KiB) Viewed 12680 times
User avatar
dirk
KUNBUS
Posts: 2174
Joined: 15 Dec 2016, 13:19

Re: RevPi failed to get modem status

Post by dirk »

Hi Robert, thank you for the good information.
This line "RevPi100257 kernel: ftdi_sio ttyUSB5: failed to get modem status: -110" indicates that
the device ran into a timeout (ETIMEDOUT) .
Furthermore the device is the ttyUSB5 so it seems to be an external USB device.

Can you post the following information:
uname -a
dmesg
/var/log/kern.log

Thanks
RMeissnerCC
Posts: 58
Joined: 03 Dec 2019, 10:29

Re: RevPi failed to get modem status

Post by RMeissnerCC »

Here you go. The file is a text file.

Apparently the problem reoccured.
Attachments
Kunbus_requirement.zip
(15.87 KiB) Downloaded 797 times
RMeissnerCC
Posts: 58
Joined: 03 Dec 2019, 10:29

Re: RevPi failed to get modem status

Post by RMeissnerCC »

Dear all,
were you able to conclude anything from the data?
The rev pi crashed twice more now. I have seen modem status 110 and 71, but they happen at times, when the RevPi is not crashing as well.
RMeissnerCC
Posts: 58
Joined: 03 Dec 2019, 10:29

Re: RevPi failed to get modem status

Post by RMeissnerCC »

Dear all,
I read through the logs. As our system is logging data multiple times each second to an external drive, we can point to the presumable timepoint of the crash. There are no logs in journalctl and kern.log from that time, all logs there are significantly older. Right now, it seems like the RevPi crashes silently.
User avatar
dirk
KUNBUS
Posts: 2174
Joined: 15 Dec 2016, 13:19

Re: RevPi failed to get modem status

Post by dirk »

Hi Robert, thank you for your reply. Your incident is important to us. We are working on it. Thank you for your patience.
User avatar
lukas
Expert
Posts: 186
Joined: 13 Feb 2017, 10:29

Re: RevPi failed to get modem status

Post by lukas »

If you add "coherent_pool=4M" to /boot/cmdline.txt, does the issue go away?
We've made that the default in our upcoming 4.19 kernel.

I gather from the log output that USB communication is failing. That's a known problem if many USB devices are attached and the DMA coherent pool is too small.

If the issue persists even if you increase the DMA coherent pool, then we can furnish you with a pre-release version of the 4.19 kernel to verify if the problem still occurs there. At this point we're days away from the 4.19 release, so debugging issues on the older 4.9 version which no longer occur on 4.19 doesn't make much sense.

The log output you've posted above shows an "Unable to handle kernel paging request" oops on June 2. Unfortunately the log output is incomplete, it's missing portions of the oops and we can't debug it in this incomplete form. The log output you've attached in Kunbus_requirement.zip does not include the oops on June 2.
RMeissnerCC
Posts: 58
Joined: 03 Dec 2019, 10:29

Re: RevPi failed to get modem status

Post by RMeissnerCC »

Hy Dirk and Lukas,
currently, the error appears after abotu 4-5 days. I need to wait until tomorrow prior to restarting our RevPi, after that the coherent_pool should be set. I will let our machine run for the time being.
We are currently wondering whether the problem could be caused by insufficient RAM as well. In some tests we saw a memory "leak" (RAM usage increasing constantly over time). We have to test and assess this though.

Is there a way for me to supply you with additional logs/informations regarding the oops? I cropped the log with respect to time, but did not delete messages there.

Best, Robert
RMeissnerCC
Posts: 58
Joined: 03 Dec 2019, 10:29

Re: RevPi failed to get modem status

Post by RMeissnerCC »

Dear Lukas,
I was finally able to change the coherent_pool to 4M and restart our system yesterday. For the first time in months and after about 900s after reboot we get about every second such a message:
"Jun 22 15:58:36 RevPi100257 kernel: [ 960.619943] ftdi_sio ttyUSB1: urb failed to clear flow control"

I am researching on this, but cannot find a solution yet.
We do have a harddisk connected to the usb, which has a specific powerline to an external powerhub (also connected to the revpi). I read something about the tx and rx lines connected only could cause problems: https://bugzilla.kernel.org/show_bug.cgi?id=46201

Do you have further suggestions what could solve this and the other problems?

Best, Robert
User avatar
lukas
Expert
Posts: 186
Joined: 13 Feb 2017, 10:29

Re: RevPi failed to get modem status

Post by lukas »

I've just sent you a download link for a kernel 4.19 pre-release deb-package. Please test if that helps. I have also checked all commits between 4.19.95 and the tip of Linus Torvalds' master branch and there have been no commits to address such an issue. So if it's not fixed by 4.19, the issue is likely still present in the upstream kernel. If we're able to reproduce the issue in our lab, we can look into coming up with a patch for the ftdi_sio driver. To reproduce it we'll probably need the exact same hardware config that you're using, so please provide output of "lsusb" and "lsusb -tv".
Post Reply