Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WiFi error when processing STA_DISCONNECTED event with reason 0 #7406

Closed
1 task done
Kuenlun opened this issue Oct 27, 2022 · 3 comments · Fixed by #7414
Closed
1 task done

WiFi error when processing STA_DISCONNECTED event with reason 0 #7406

Kuenlun opened this issue Oct 27, 2022 · 3 comments · Fixed by #7414

Comments

@Kuenlun
Copy link

Kuenlun commented Oct 27, 2022

Board

esp32-s3-devkitc-1

Device Description

Bare esp32-s3-devkitc-1 board connected through micro-USB cable (on the UART port) to see serial monitor.

Hardware Configuration

No other hardware is attached to the development board.

Version

v2.0.4

IDE Name

PlatformIO

Operating System

Microsoft Windows 10 Home (10.0.19045 compilation 19045)

Flash frequency

40MHz

PSRAM enabled

no

Upload speed

460800

Description

I've been developing a WiFi keep alive FreeRTOS task implemented with events for maximum eficiency (as much as I can get with the Arduino framework).
Each time I get "ARDUINO_EVENT_WIFI_STA_DISCONNECTED" event, I try to reconnect in two different ways depending on the reason:

  1. If the reason of the disconnection is "ASSOC_FAIL" probably my router is the problem and it won't let me connect for 60 seconds, so I wait.
    [Extra info]
    As my router won't let me connect for 60 seconds and the purpose of the task im developing is to keep the WiFi alive, I have disabled WiFi reconnection: "WiFi.setAutoReconnect(false);"
    If I don't do this I get lots of warnings and errors each time WiFi tries to reconnect (each 100 ms). With WiFi reconnection disabled I only get it twice (because even with WiFi reconnection disabled, the ESP tries to reconnect once).
[   181][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 0 - WIFI_READY
[   219][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 2 - STA_START
E (383) wifi:Set status to INIT
[   270][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 5 - STA_DISCONNECTED
[   270][W][WiFiGeneric.cpp:950] _eventCallback(): Reason: 203 - ASSOC_FAIL
[   277][D][WiFiGeneric.cpp:966] _eventCallback(): WiFi Reconnect Running
[   293][D][TaskWiFi.cpp:101] WiFiKeepAliveTask(): Connecting to CETEM_nave
E (484) wifi:Set status to INIT
[   371][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 5 - STA_DISCONNECTED
[   371][W][WiFiGeneric.cpp:950] _eventCallback(): Reason: 203 - ASSOC_FAIL
[   378][D][WiFiGeneric.cpp:975] _eventCallback(): WiFi AutoReconnect Running
[   390][W][TaskWiFi.cpp:40] WiFiCallBack(): ARDUINO_EVENT_WIFI_STA_DISCONNECTED
[   394][D][TaskWiFi.cpp:101] WiFiKeepAliveTask(): Connecting to CETEM_nave
E (586) wifi:Set status to INIT
  1. Whatever other reason the WiFi got disconnected, I try to reconnect it immediately.

The problem arises when very rarely (sometimes I have to reboot the board 50 times) Arduino gets the "ARDUINO_EVENT_WIFI_STA_DISCONNECTED" event with reason "0" (it isn't even listed in "arduino-esp32/blob/master/tools/sdk/esp32s3/include/esp_wifi/include/esp_wifi_types.h"):
       typedef enum {
        WIFI_REASON_UNSPECIFIED = 1,
        WIFI_REASON_AUTH_EXPIRE = 2,
       ...
        WIFI_REASON_ASSOC_FAIL = 203,
        WIFI_REASON_HANDSHAKE_TIMEOUT = 204,
        WIFI_REASON_CONNECTION_FAIL = 205,
        WIFI_REASON_AP_TSF_RESET = 206,
        WIFI_REASON_ROAMING = 207,
       } wifi_err_reason_t;

At that moment I get the following:

[   311][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 7 - STA_GOT_IP
[   319][D][WiFiGeneric.cpp:991] _eventCallback(): STA IP: xxx.xxx.xxx.xxx, MASK: 255.255.255.0, GW: xxx.xxx.xxx.xxx
[   330][I][TaskWiFi.cpp:60] WiFiCallBack(): ARDUINO_EVENT_WIFI_STA_GOT_IP. WiFi connected
[   389][V][WiFiGeneric.cpp:360] _arduino_event_cb(): STA Disconnected: SSID: blablabla, BSSID: xx:xx:xx:xx:xx:xx, Reason: 0
[   390][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 5 - STA_DISCONNECTED
Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.

Core  1 register dump:
PC      : 0x400556b4  PS      : 0x00060c30  A0      : 0x8205ea4c  A1      : 0x3fce6b10
A2      : 0x00000001  A3      : 0xfffffffd  A4      : 0x000000ff  A5      : 0x0000ff00
A6      : 0x00ff0000  A7      : 0xff000000  A8      : 0x00000000  A9      : 0x00000000
A10     : 0x00000000  A11     : 0x3fce6d13  A12     : 0x00000000  A13     : 0x00000000
A14     : 0x00060e20  A15     : 0x00000001  SAR     : 0x00000004  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000001  LBEG    : 0x400556d5  LEND    : 0x400556e5  LCOUNT  : 0xfffffffc


Backtrace:0x400556b1:0x3fce6b10 |<-CORRUPTED

I have traced the error and found that inside "WiFiGeneric.cpp" file line 950, there is a call to the "reason2str" macro.

928: if(event->event_id < ARDUINO_EVENT_MAX) {
929: log_d("Arduino Event: %d - %s", event->event_id, arduino_event_names[event->event_id]);
930: }
...
948: } else if(event->event_id == ARDUINO_EVENT_WIFI_STA_DISCONNECTED) {
949: uint8_t reason = event->event_info.wifi_sta_disconnected.reason;
950: log_w("Reason: %u - %s", reason, reason2str(reason));
...

As you can see, when the reason is 0, the program tries to read memory from "system_event_reasons[-1]".
#define reason2str(r) ((r>176)?system_event_reasons[r-176]:system_event_reasons[r-1])

This problem can be easily fixed adding the reason 0 to the "system_event_reasons" array, also to the "esp_wifi_types.h" "wifi_err_reason_t" enum, and changing the "reason2str" macro to.

As the reason 0 is suposed not to exists maybe the problem is from ESP-IDF. What do you think?

Thanks in advanced.

Sketch

#include <WiFi.h>

#include "secrets.h"

#define WIFI_CONNECTION_TIMEOUT_MS 5 * 1000

static TimerHandle_t timerWiFiReconnectWhenASSOC_FAIL = NULL;
static SemaphoreHandle_t binarySemphrWiFiDisconnected = NULL;

static void timerWiFiReconnectWhenASSOC_FAILcb(TimerHandle_t xTimer)
{
    BaseType_t xHigherPriorityTaskWoken = pdFALSE;
    xSemaphoreGiveFromISR(binarySemphrWiFiDisconnected, &xHigherPriorityTaskWoken);
    // Did this action unblock a higher priority task?
    if (xHigherPriorityTaskWoken)
        portYIELD_FROM_ISR();
}

static void WiFiCallBack(WiFiEvent_t event, WiFiEventInfo_t info)
{
    BaseType_t xHigherPriorityTaskWoken = pdFALSE;
    switch (event)
    {
    case ARDUINO_EVENT_WIFI_READY:
        isr_log_i("ARDUINO_EVENT_WIFI_READY");
        break;
    case ARDUINO_EVENT_WIFI_STA_START:
        isr_log_i("ARDUINO_EVENT_WIFI_STA_START");
        break;
    case ARDUINO_EVENT_WIFI_STA_STOP:
        isr_log_w("ARDUINO_EVENT_WIFI_STA_STOP");
        break;
    case ARDUINO_EVENT_WIFI_STA_CONNECTED:
        isr_log_i("ARDUINO_EVENT_WIFI_STA_CONNECTED");
        break;
    case ARDUINO_EVENT_WIFI_STA_DISCONNECTED:
    {
        const uint8_t reason = info.wifi_sta_disconnected.reason;
        isr_log_w("ARDUINO_EVENT_WIFI_STA_DISCONNECTED (Reason: %u)", reason);
        switch (reason)
        {
        case 0:
            // Causes Guru Meditation Error on core  1
            isr_log_e("WiFi unknown error");
            break;
        case 203:
            isr_log_w("WiFi ASSOC_FAIL: try to reconnect in %d seconds", WIFI_CONNECTION_TIMEOUT_MS / 1000);
            xTimerStartFromISR(timerWiFiReconnectWhenASSOC_FAIL, &xHigherPriorityTaskWoken);
            break;
        default:
            xSemaphoreGiveFromISR(binarySemphrWiFiDisconnected, &xHigherPriorityTaskWoken);
            break;
        }
        break;
    }
    case ARDUINO_EVENT_WIFI_STA_GOT_IP:
        isr_log_i("ARDUINO_EVENT_WIFI_STA_GOT_IP. WiFi connected");
        break;
    case ARDUINO_EVENT_WIFI_STA_LOST_IP:
        isr_log_w("ARDUINO_EVENT_WIFI_STA_LOST_IP");
        break;
    default:
        isr_log_w("Got other WiFi event: %u", event);
        break;
    }

    // Did this action unblock a higher priority task?
    if (xHigherPriorityTaskWoken)
        portYIELD_FROM_ISR();
}

static void WiFiKeepAliveTask(void *parameters)
{
    // Create binary semaphore to know if WiFi has disconnected
    binarySemphrWiFiDisconnected = xSemaphoreCreateBinary();
    if (!binarySemphrWiFiDisconnected)
        log_e("Could not create binary semaphore");
    xSemaphoreGive(binarySemphrWiFiDisconnected);

    // Create the WiFi connection timeout timer
    timerWiFiReconnectWhenASSOC_FAIL = xTimerCreate(
        "timerWiFiReconnectWhenASSOC_FAIL",
        pdMS_TO_TICKS(WIFI_CONNECTION_TIMEOUT_MS),
        pdFALSE,
        (void *)0,
        timerWiFiReconnectWhenASSOC_FAILcb);
    if (!timerWiFiReconnectWhenASSOC_FAIL)
        log_e("Could not create timer");

    // Set the WiFi callback
    WiFi.onEvent(WiFiCallBack);
    // Disable auto reconnect, the current task will take care of that
    WiFi.setAutoReconnect(false);
    // Configure WiFi as STA
    WiFi.mode(WIFI_STA);

    // Reconect WiFi when disconnected event
    while (true)
    {
        if (xSemaphoreTake(binarySemphrWiFiDisconnected, portMAX_DELAY))
        {
            // Connect to WiFi
            log_d("Connecting to %s", WIFI_SSID);
            WiFi.begin(WIFI_SSID, WIFI_PASS);
        }
    }
}

/* * * * * *
 *  SETUP  *
 * * * * * */
void setup()
{
    // Create the WiFi keep alive task
    createWiFiKeepAliveTask();

    // Delete setup and loop task
    vTaskDelete(NULL);
}

/* * * * * *
 *  LOOP   *
 * * * * * */
void loop()
{
    // This task has been deleted
    // Code here won't run
}

Debug Message

Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.

Core  1 register dump:
PC      : 0x400556b4  PS      : 0x00060c30  A0      : 0x8205ea4c  A1      : 0x3fce6b10  
A2      : 0x00000001  A3      : 0xfffffffd  A4      : 0x000000ff  A5      : 0x0000ff00
A6      : 0x00ff0000  A7      : 0xff000000  A8      : 0x00000000  A9      : 0x00000000  
A10     : 0x00000000  A11     : 0x3fce6d13  A12     : 0x00000000  A13     : 0x00000000
A14     : 0x00060e20  A15     : 0x00000001  SAR     : 0x00000004  EXCCAUSE: 0x0000001c  
EXCVADDR: 0x00000001  LBEG    : 0x400556d5  LEND    : 0x400556e5  LCOUNT  : 0xfffffffc


Backtrace:0x400556b1:0x3fce6b10 |<-CORRUPTED

Other Steps to Reproduce

No response

I have checked existing issues, online documentation and the Troubleshooting Guide

  • I confirm I have checked existing issues, online documentation and Troubleshooting guide.
@Kuenlun Kuenlun added the Status: Awaiting triage Issue is waiting for triage label Oct 27, 2022
@lbernstone
Copy link
Contributor

Just to clarify, you are getting this STA_DISCONNECTED at boot, before the device has ever connected to an AP? I don't think this is coming from IDF, AFAICT that event code is an arduino thing. Since the station has not actually disconnected, I think this is probably an erroneous pathway (hence the reason 0), and tightening the logic there should help out a lot of connection issues.

@mrengineer7777
Copy link
Collaborator

mrengineer7777 commented Oct 28, 2022

@Kuenlun Excellent analysis. Accessing array element system_event_reasons[-1] would indeed cause a crash.
Since the crash happens in reason2str(), I suggest a one line patch would be safer and easier than modifying

[ "system_event_reasons" array, also to the "esp_wifi_types.h" "wifi_err_reason_t" enum, and changing the "reason2str" macro to.

I would change

   uint8_t reason = event->event_info.wifi_sta_disconnected.reason;
   log_w("Reason: %u - %s", reason, reason2str(reason));

To

   uint8_t reason = event->event_info.wifi_sta_disconnected.reason;
   if(!reason) reason = 1;  //Reason 0 causes crash.  Use reason 1 "UNSPECIFIED".
   log_w("Reason: %u - %s", reason, reason2str(reason));

Why IDF returns 0 is an issue for someone to look into. As you noted that is not a valid reason in "esp_wifi_types.h".
Event chain:
[WiFiGenericClass line 580] esp_event_handler_instance_register(WIFI_EVENT, ESP_EVENT_ANY_ID, &_arduino_event_cb, NULL, NULL))
-> [WiFiGenericClass line 330] _arduino_event_cb()
->-> WiFiGenericClass::_eventCallback()

Be careful with xTimerCreate(). That is an ESP-IDF native call. The Arduino core uses 1 of the system timers. It's possible to break that timer and cause odd behaviors. See the "Ticker" class for a safe way to fire timers in Arduino.

Possibly related: #7344

@Kuenlun
Copy link
Author

Kuenlun commented Oct 31, 2022

@mrengineer7777 Thanks for the feedback. I understand that the proper way of fixing this involves modifiying:

Adding the reason 0 to the "system_event_reasons" array, also to the "esp_wifi_types.h" "wifi_err_reason_t" enum, and changing the "reason2str" macro.

And also understanding why this reason is returned in the first place. But all of this is very unsafe as the fix could unchain many unexpected errors.

I think the changes you propose are ok.
Fix 1 (from @mrengineer7777):

   uint8_t reason = event->event_info.wifi_sta_disconnected.reason;
   if(!reason) reason = 1;  //Reason 0 causes crash.  Use reason 1 "UNSPECIFIED".
   log_w("Reason: %u - %s", reason, reason2str(reason));

I would like to add a minor change, replacing reason = 1 to reason = WIFI_REASON_UNSPECIFIED.
Fix 2:

   uint8_t reason = event->event_info.wifi_sta_disconnected.reason;
   // Reason 0 causes crash, use reason 1 (UNSPECIFIED) instead
   if(!reason) reason = WIFI_REASON_UNSPECIFIED;
   log_w("Reason: %u - %s", reason, reason2str(reason));

The WIFI_REASON_UNSPECIFIED enum value will be defined as it is already included:
esp_wifi_types.h -> WiFiType.h -> WiFiGeneric.h -> WiFiGeneric.cpp

From my code style point of view, I would do the following.
Fix 3:

   const uint8_t &_reason = event->event_info.wifi_sta_disconnected.reason;
   // Reason 0 causes crash, use reason 1 (UNSPECIFIED) instead
   const uint8_t reason = _reason ? _reason : WIFI_REASON_UNSPECIFIED;
   log_w("Reason: %u - %s", reason, reason2str(reason));

But I understand it will break the code style of the file and the project itself.

I will propose a pull request adding the second fix.

With respect to "xTimerCreate" function I'll have a look to what you say. I have implemented timers in this way because of Shawn Hymel (Digi-Key) FreeRTOS tutorials from YouTube, but I don't have much experience with it.
Thanks!

me-no-dev pushed a commit that referenced this issue Nov 2, 2022
Fixed #7406 . The "reason2str" macro in WiFiGeneric.cpp tries to read memory from index "-1"  in "system_event_reasons" array when handling STA_DISCONNECTED event with reason 0. Dealing with reason 0 as a reason 1 (WIFI_REASON_UNSPECIFIED) will solve the problem (the reason for this event to arrive with reason 0 is unknown). #7406
@VojtechBartoska VojtechBartoska added Status: Solved and removed Status: Awaiting triage Issue is waiting for triage labels Nov 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants