-
Notifications
You must be signed in to change notification settings - Fork 7.6k
WifiClientSecure hangs on write - socket error #3460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Are you checking to make sure the connection is active before you send? |
Sorry for the late reply.
So I only send if I have wifi and mqtt connection. After that I have:
I've check logs on the mqtt broker as well, and it also records "Socket Error on the client", so I guess it's a network error that happens during the mqtt write. |
[STALE_SET] This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions. |
[STALE_DEL] This stale issue has been automatically closed. Thank you for your contributions. |
I have a similar problem here. I am also using PubSubClient library with WiFiClientSecure to publish in a local Mosquitto server. Everything works great, but sometimes when the connection with the server goes down the while in the send_ssl_data function never finishes. After some debugging and reading the mbedtls_ssl_write documentation, I realized that this function was returning MBEDTLS_ERR_SSL_WANT_WRITE. According to mbedtls_ssl_write documentation, it is an error code and call mbedtls_ssl_write again and again in the while will not solve this apparently. The loop does not finish even when the connection is up again. So, the only alternative is restart the ESP-32. I am not sure, but the while and the line len = ret seem to be not necessary. Here, I solved the problem by reimplementing the function like that:
|
Thanks!
So I think removing it might be ok most of the times, but cause some problems in some cases. But in any case, I will follow your suggestion and try it out! Hopefully, it solved the problem for me as well. |
You are probably right. I think that keep the while and add an attempt counter could be a better solution. From the mbedtls_ssl_write documentation: "When this function returns MBEDTLS_ERR_SSL_WANT_WRITE/READ, it must be called later with the same arguments, until it returns a value greater that or equal to 0. When the function returns MBEDTLS_ERR_SSL_WANT_WRITE there may be some partial data in the output buffer, however this is not yet sent.". However, I get MBEDTLS_ERR_SSL_WANT_WRITE over and over what keeps the while in a loop. It could be an issue from mbedtls library and not from WifiClientSecure library. |
yeah, I think you're right. It probably doesn't handles well writing when the connection had problems. |
This is an issue that's causing a lot of problems on our application, and it was pretty complex to track. We are publishing multiple stuff using PubSubClient (we also tried some alternatives, but they are all using WiFiClientSecure ). Hardware: Description During operation (with verbose log) and standard publishing, the message "Writing HTTP request" is visible, but sometimes the system just hangs and triggers WDT and ESP restarts. After decoding Backtrace the following line looked suspicious:
After more digging and logging, we had several conclusions:
Fix by @felipeapp looks promising, but I don't know if this is WiFiClientSecure issue or mbedtls . I don't know the logic behind:
Maybe somebody can check if it is safe to remove it and implement @felipeapp solution since it looks like it solves this problem from our side also? |
@me-no-dev - can you please reopen this issue? |
@igolubic Thanks. What you wrote matches more or less what I remembered. It is definitely caused by some occasional network issue (expected). Then, MBEDTLS_ERR_SSL_WANT_WRITE gets thrown and not handled properly. As you said, it's not clear if this is WifiClientSecure or mbedtls fault. I have mostly resolved it, on my system, by using a rasperrypi on the local network. This has an mqtt broker, and node-red flows which receive the messages, and then re-transmit them to my original broker (not on the local network). As the ESP32-Mqtt communication happens on local network, there are no observed network errors. For sure it's not ideal, as it basically necessitate a Rasperrypi "relay" to introduce QoS 2. The commit you pushed: Does it seem to work for you? Has it improved things? |
@Onigiri-Kun - just created the pull request #4424. It's been working in production for at least 1 month without any issues on the public network and not a single byte was dropped. This solution is currently used in quite a few projects by custom reference in the platformio.ini file, hopefully, it will pass to production to avoid using custom referencing. Custom referencing: |
Fixes: #3460 This code has been run in production for 1 month and it looks stable, no data dropped and it definitely fixes the issue described. I think that this can be merged to avoid using custom package referencing in PlatformIO that has been used in quite a few projects for now. Co-authored-by: Ivan Golubic <ivan@mvt-solutions.com>
//This solution will try multiple times before failing. Have tested this thoroughly. Alter the retry variable to suit your need.
|
Hardware:
Board: ESP32 Dev Module
Core Installation version: ESP32 Arduino 1.0.3 (IDF 3.2) Release
IDE name: Visual Studio with Visual Micro add on
Flash Frequency: 40Mhz - tried a few and problem persists.
PSRAM enabled: no
Upload Speed: 115200
Computer OS: Windows 10
Description:
I have an ESP32 publishing code to a Mosquitto MQTT server, using the PubSubClient library.
This works fine, and publishes many short messages (around 50 float values, to one per topic) at a 10 seconds interval (50 mqtt write every 10 seconds).
I had problems with the board hanging occasionally, and managed to isolate the problem to the WebClientSecure library.
The log files of the server would give "Socket Error on the client".
The ESP32 debug log would give "Writing HTTP request", before hanging.
The point where the code hangs is the following:
The sketch hangs in the while loop, because mbedtls_ssl_write does not return an error code.
I attribute this to some occasional write failure/socket failure, which is not handled properly.
I've tried adding some timeout code here, which does catch the problem. However, I am not able to reset the code correctly and restart successfully transmission of data without a board reset.
My question is: Can you have code to handle a timeout of the mbedtls_ssl_write?
In such a way that, if some socket error occurs, the client can restart transmission quickly?
Debug Messages:
The text was updated successfully, but these errors were encountered: