Bug 291040 - security/wazuh-manager: agent-manager connection doesn't work over TCP
Summary: security/wazuh-manager: agent-manager connection doesn't work over TCP
Status: Open
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Some People
Assignee: Jose Alonso Cardenas Marquez
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-11-16 11:43 UTC by Paweł Krawczyk
Modified: 2025-11-17 17:25 UTC (History)
0 users

See Also:
bugzilla: maintainer-feedback? (acm)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Paweł Krawczyk 2025-11-16 11:43:27 UTC
With wazuh-manager 4.12.0 and wazuh-agent 4.12.0 installed on the same network, the agent is unable to obtain a working connection to the manager over TCP, remaining always in the HC_STARTUP state. 

Changing the client-server connection type to <protocol>udp</protocol> magically solves the problem.

This is happening on FreeBSD 14.3, but exactly the same issue had been previously documented on FreeBSD 13.1 with Wazuh 4.3.7:

https://groups.google.com/g/wazuh/c/t0iSFb5ad9Q

I've spent significant amount of time debugging the TCP issues on both machines, excluding possibility of any network or firewall problems, so it seems like there's some nuanced OS-specific bug in Wazuh that prevents TCP connections from working on FreeBSD.
Comment 1 Paweł Krawczyk 2025-11-16 14:43:16 UTC
The root cause seem to be that the os_delwait() function doesn't seem to be working on FreeBSD for either TCP or UDP connections:

https://github.com/wazuh/wazuh/blob/d14109a8dad7da9bb73231c1be656c7675cccfd7/src/shared/wait_op.c#L36

The function's operations is trivial - it just deletes a lock file /var/ossec/queue/sockets/.wait after agent is connected, but until that happens wazuh-agentd doesn't "see" the agent as connected and it forever remains in pending state. After I manually delete the lock file (rm /var/ossec/queue/sockets/.wait) everything suddenly starts working.
Comment 2 Paweł Krawczyk 2025-11-16 17:10:35 UTC
Also reported upstream https://github.com/wazuh/wazuh/issues/33176
Comment 3 Jose Alonso Cardenas Marquez freebsd_committer freebsd_triage 2025-11-17 15:15:51 UTC
(In reply to Paweł Krawczyk from comment #1)

Hello Pawel, did you erase .wait file from manager or agent? I think os_delwait() function is an agent function. I was trying solve this issue from some weeks ago because I have a wazuh-4.14.1 update ready (I'm waiting for 15-RELEASE) and It seems like a manager pthread/mutex issues. Manager is not returning an answer to HC_STARTUP from agent and it could be because a thread is locked in the manager. Try with a wazuh-agent freebsd-based to an manager in other SO like rockylinux for example.
Comment 4 Paweł Krawczyk 2025-11-17 17:23:29 UTC
(In reply to Jose Alonso Cardenas Marquez from comment #3)

Surprisingly, I had to delete the .wait lock on Manager side - the agent was sending the hello messages but manager simply ignored them. After removing .wait lock it suddenly "saw" the agent and communications continued as usual.

Maybe it would be worth adding some debugging messages around all the unlink() syscalls in wait_op.c so see what exactly is causing them to fail? Right now the return is completely ignored.

I suspect some esoteric difference in filesystem unlink() semantics between OSes, maybe related to use of chroot()? I have noticed one more issue in ossec.log on the manager side:

2025/11/17 00:02:19 wazuh-modulesd: CRITICAL: At pthread_mutex_destroy(): Invalid argument

But this one doesn't seem to have any visible impact on Wazuh operations.
Comment 5 Paweł Krawczyk 2025-11-17 17:25:41 UTC
The usage pattern of pthread_mutex_destroy() is also interesting -  there's an error-checking wrapper, but there's also many usages that ignore any errors in that call:

https://github.com/search?q=repo%3Awazuh%2Fwazuh%20pthread_mutex_destroy&type=code