RevPi Core loosing connection and slow webinterface
-
- Posts: 14
- Joined: 14 Apr 2022, 15:03
RevPi Core loosing connection and slow webinterface
Hi,
I have a rack with a RevPi Core and 2 MIO modules for monitoring a remote installation.
The RevPi Core is connected to a router which provides a secure internet connection to push data to a server using MQTT.
I'm using RevPiPyload for forwarding the signals with MQTT.
Everything was working while testing at my desk with a couple of dummy signals, but now that it is installed in the field and sending multiple signals it seems to be loosing connection every couple of hours.
As clarification a screenshot from my monitoring dashboard with the number of messages/minute. I can see in the loggings on the server that the RevPi MQTT Client disconnects with a socket error, but this happens a couple of minutes after I stop receiving messages. So I think the MQTT server just closes the RevPi because it is unresponsive.
I've also noticed that the webinterface of the RevPi is veeeery slow -> Do something, wait a minute, do something, wait a minute, ..
In a couple of days I will drive to the installation to try and fix this, any ideas what this might be? Things I should check?
I hope to find something in the deamon log...
I know this is kind of a wide question, but any experiences/ideas are welcome.
Thanks
I have a rack with a RevPi Core and 2 MIO modules for monitoring a remote installation.
The RevPi Core is connected to a router which provides a secure internet connection to push data to a server using MQTT.
I'm using RevPiPyload for forwarding the signals with MQTT.
Everything was working while testing at my desk with a couple of dummy signals, but now that it is installed in the field and sending multiple signals it seems to be loosing connection every couple of hours.
As clarification a screenshot from my monitoring dashboard with the number of messages/minute. I can see in the loggings on the server that the RevPi MQTT Client disconnects with a socket error, but this happens a couple of minutes after I stop receiving messages. So I think the MQTT server just closes the RevPi because it is unresponsive.
I've also noticed that the webinterface of the RevPi is veeeery slow -> Do something, wait a minute, do something, wait a minute, ..
In a couple of days I will drive to the installation to try and fix this, any ideas what this might be? Things I should check?
I hope to find something in the deamon log...
I know this is kind of a wide question, but any experiences/ideas are welcome.
Thanks
Re: RevPi Core loosing connection and slow webinterface
Hi,
I'm sorry to hear that you have connection issues. In order to make some suggestions, please provide more details about your setup:
I'm sorry to hear that you have connection issues. In order to make some suggestions, please provide more details about your setup:
- Which devices do you use? Core / Core 3(+) / Core S?
- What image version (cat /etc/revpi/image-release)?
- Which kernel version (uname -a)
- Kernel log file (/var/log/kern.log)
- Are there any devices attached to the USB ports?
- What's about system / IO load?
-
- Posts: 14
- Joined: 14 Apr 2022, 15:03
Re: RevPi Core loosing connection and slow webinterface
Hi Nicolai,
Thank you for your reply.
I'll will need to get back to you for some of the details because I'll have to go onsite to have access to the RevPi.
Thank you for your reply.
I'll will need to get back to you for some of the details because I'll have to go onsite to have access to the RevPi.
- Which devices do you use? Core / Core 3(+) / Core S? Core (+ 2 x MIO module)
- What image version (cat /etc/revpi/image-release)? As far as I can remember it's the 2021-07-01-revpi-buster. Will check on-site
- Which kernel version (uname -a) To check
- Kernel log file (/var/log/kern.log) To check
- Are there any devices attached to the USB ports? No, no devices are connected to the USB ports
- What's about system / IO load? How can I check the system load? (like CPU load?) IO's are limited, only 14 signals connected .
-
- Posts: 14
- Joined: 14 Apr 2022, 15:03
Re: RevPi Core loosing connection and slow webinterface
Hi Nicolai,
I had the chance to get to the RevPi today, some additional information:
I found services which took up a lot of CPU and weren't needed; such as everything concerning node-red (I only use RevPiPyload). After cleaning this up I saw that systemd-journald and rsyslogd were still hogging a lot of CPU. I assume that this was because they couldn't write any more logs because the memory was full; after removing kern.log.1 - which was 400MB!! - their CPU percentage dropped. (but RevpiPyload just took whatever that came free).
After restarting RevPiPyLoad the data was coming in again.
I fear though that this is still temporary, because using journalctl -f I could see new errors coming in (at a lower speed though, but still they are errors), which will eat up the memory again. These errors seem to refer to a bad communication with the pibridge, this keeps on repeating :
Though in my dashboard I see all the data coming in EXCEPT the DI/DO of the MIO modules (but that might be a different error?).
Any ideas how to fix this or approach this?
I had the chance to get to the RevPi today, some additional information:
- Which devices do you use? Core / Core 3(+) / Core S? Core (+ 2 x MIO module)
- What image version (cat /etc/revpi/image-release)? 2021-07-01-revpi-buster.img
- Which kernel version (uname -a) Linux RevPi35693 4.19.95-rt38 #1 PREEMPT RT Tue, 22 Jun 2021 14:13:31 +0000 armv6l GNU/Linux
- Kernel log file (/var/log/kern.log) See attachment
- Are there any devices attached to the USB ports? No, no devices are connected to the USB ports
- What's about system / IO load? The CPU was/is at 100%, the memory was also full... (See attachment for screenshot of htop)
- I could SSH in the RevPi and could confirm that there was no issue with the internet connection.
I found services which took up a lot of CPU and weren't needed; such as everything concerning node-red (I only use RevPiPyload). After cleaning this up I saw that systemd-journald and rsyslogd were still hogging a lot of CPU. I assume that this was because they couldn't write any more logs because the memory was full; after removing kern.log.1 - which was 400MB!! - their CPU percentage dropped. (but RevpiPyload just took whatever that came free).
After restarting RevPiPyLoad the data was coming in again.
I fear though that this is still temporary, because using journalctl -f I could see new errors coming in (at a lower speed though, but still they are errors), which will eat up the memory again. These errors seem to refer to a bad communication with the pibridge, this keeps on repeating :
Code: Select all
Sep 23 09:37:01 RevPi35693 kernel: [344453.896462] piControl: recv len from pibridge err(got:0, exp:20)
Sep 23 09:37:01 RevPi35693 kernel: [344453.896484] piControl: talk with mio for dio data error(addr:30, ret:-70)
Sep 23 09:37:01 RevPi35693 kernel: [344453.901222] piControl: crc for dio data err(got:0, expect:143)
Sep 23 09:37:01 RevPi35693 kernel: [344454.396678] piControl: recv len from pibridge err(got:0, exp:20)
Sep 23 09:37:01 RevPi35693 kernel: [344454.396698] piControl: talk with mio for aio data error(addr:31, ret:-70)
Sep 23 09:37:01 RevPi35693 kernel: [344454.400514] piControl: crc for dio data err(got:0, expect:143)
Any ideas how to fix this or approach this?
- Attachments
-
- logs.zip
- (500.63 KiB) Downloaded 366 times
Re: RevPi Core loosing connection and slow webinterface
Thanks for the detailed report. From your log files I can see lots of error on the backplane bus (piControl). Could you please check the following steps?
- Check if all PiBridge connectors are connected tighly
- Disable all unnecessary services / applications as the Core is a avery resource restricted system (only one CPU core, only 700 MHz) and check if the error still occurs
- Disable revpipyload and check if the error still occurs
- Try to reduce revpipyload cycletime
-Nicolai
- Check if all PiBridge connectors are connected tighly
- Disable all unnecessary services / applications as the Core is a avery resource restricted system (only one CPU core, only 700 MHz) and check if the error still occurs
- Disable revpipyload and check if the error still occurs
- Try to reduce revpipyload cycletime
-Nicolai
-
- Posts: 14
- Joined: 14 Apr 2022, 15:03
Re: RevPi Core loosing connection and slow webinterface
Hi Nicolai,
Great suggestions, I'll try them out today if I'm able to get there.
Some other ideas:
Thanks again for the fast replies!
Great suggestions, I'll try them out today if I'm able to get there.
Some other ideas:
- The RevPiPyload consumes quite some CPU, initially I chose this over NodeRed with the assumption that it would be less CPU intensive, is this the case? (Maybe you've done some benchmarking? )
- If I can't get it working with my next visit, I'm tempted to buy a new RevPi Core 3 with more compute power. This won't solve the backbone/PiBridge error though
Thanks again for the fast replies!
Re: RevPi Core loosing connection and slow webinterface
Hi,
yes revpipyload uses lesser resources than the NodeRed approach. So this might be one of the better options.
@RevPi Core 3: Core 3 is not available due to the ongoing chip shortage, but you can use the even better Core S for this. As your errors are probably related to insufficient CPU power (logs suggests that) which can result in losses on the backplane bus, I'm quite optimistic.
Nicolai
yes revpipyload uses lesser resources than the NodeRed approach. So this might be one of the better options.
@RevPi Core 3: Core 3 is not available due to the ongoing chip shortage, but you can use the even better Core S for this. As your errors are probably related to insufficient CPU power (logs suggests that) which can result in losses on the backplane bus, I'm quite optimistic.
Nicolai
-
- Posts: 14
- Joined: 14 Apr 2022, 15:03
Re: RevPi Core loosing connection and slow webinterface
So I just got back from the site:
I tried to reproduce the error with revpipyload disabled and using piTest, but I didn't manage to trigger it...
The error is coming from both cards (addr 30 & 31, which is the same as in Pictory):
Some additional ideas:
Anyway, I'm ordering a new Core S in the meanwhile.
- Check if all PiBridge connectors are connected tighly Confirmed, nice and tight
- Disable all unnecessary services / applications as the Core is a avery resource restricted system (only one CPU core, only 700 MHz) and check if the error still occurs Killed all unnecessary services / applications, error still there
- Disable revpipyload and check if the error still occurs No error with revpipyload disabled
- Try to reduce revpipyload cycletime Reactivated revpipyload with no event-triggered IO, only read and send 15-Inputs (marked with Export in pictory) each 15s. The error still occurs but only when the data is requested (and not each time the data is read). CPU load at this point was about 67%
I tried to reproduce the error with revpipyload disabled and using piTest, but I didn't manage to trigger it...
The error is coming from both cards (addr 30 & 31, which is the same as in Pictory):
Code: Select all
Sep 26 12:33:29 RevPi35693 kernel: piControl: talk with mio for dio data error(addr:31, ret:-70)
Sep 26 12:33:31 RevPi35693 kernel: piControl: recv len from pibridge err(got:0, exp:20)
Sep 26 12:33:31 RevPi35693 kernel: piControl: talk with mio for aio data error(addr:30, ret:-70)
Sep 26 12:33:31 RevPi35693 kernel: piControl: recv len from pibridge err(got:0, exp:20)
Sep 26 12:33:31 RevPi35693 kernel: piControl: talk with mio for aio data error(addr:30, ret:-70)
Some additional ideas:
- The RevpiPyload log displayed 2 things:
Code: Select all
/usr/lib/python3/dist-packages/revpimodio2/modio.py:324: Warning: equal device name in pictory configuration. can not build device to access by name. you can access all devices by position number .device[nn] only! 2022-09-19 09:47:35 [ERROR ] plc file does not exists /var/lib/revpipyload/program.py
- The first warning is resolved now, but it doesn't solve the issue (renamed one of the MIO modules in pictory)
- As for the error (not finding the plc file) is correct as there is no program.py in that directory. What is it function? Because it seems to work OK without this program
- I still don't receive any status from DI/DO from the MIO, could the error be pointing to this?
I am able to read it out with piTest though (but then there is no error) - Can updating the MIO modules help? (I have an export of Pictory should that be helpful)
Anyway, I'm ordering a new Core S in the meanwhile.
-
- Posts: 14
- Joined: 14 Apr 2022, 15:03
Re: RevPi Core loosing connection and slow webinterface
Hi,
So I swapped the old RevPi Core with a new RevPi Core S 8GB and this one isn't even sweating with 20k messages/minute.
I guess the old RevPi single core wasn't made for the job
Thanks for the support!
So I swapped the old RevPi Core with a new RevPi Core S 8GB and this one isn't even sweating with 20k messages/minute.
I guess the old RevPi single core wasn't made for the job
Thanks for the support!
- Attachments
-
- htop.JPG (152.73 KiB) Viewed 5216 times