Archived Forum Post

Index of archived forum posts

Question:

SMTPQ Gets Stuck and Needs Restart?

Aug 04 '15 at 09:27

Your web site ( http://www.cknotes.com/?cat=138) describes quite clearly the reasons for WSAEWOULDBLOCK

Some of my customers using smtpQ are experiencing this failure but it is not clear why, at that point, the solution is to restart the smtpQ service.
Immediately after the restart, the emailing resumes without a problem.

Questions:
1. why would this error block any future emails until the service is restarted? Isn't the service designed to try again if there are emails queueud in the Outgoing folder?

  1. Do you have a sample code designed to detect this situation and automatically restart the service?

Answer

It may be that each of the threads gets stuck for a while because:

1) An attempt to send the email fails after a timeout (i.e. WSAEWOULDBLOCK)
and
2) The thread goes into a rety sequence, where MaxRetries = 10.

The MaxRetries can be configured to use from 0 to 10 retries (for sending mail). The delay time between each subsequent retry is according to this schedule:

tryCount            delayTime
--------            ---------
1                   5 sec
2                   10 sec
3                   15 sec
4                   1 minute
5                   1.5 minutes
6                   2  minutes
7                   5 minutes
8                   10 minutes
9                   15 minutes
10                  20 minutes

When the MaxThreads are busy re-trying each of the emails, the queue will become blocked. The solution is to:

1) Determine the cause for the WSAEWOULDBLOCK, which may be a firewall, anti-virus, nothing listening at the remote host:port, etc. and fix it.

and

2) Perhaps configure the SMTPQ's MaxRetries property to a lower value (anything from 0 to 10 is OK).

The only REAL solution is to determine the cause of the blocked connection.


Answer

The problem persists for a couple of days once it starts. And it immediately gets fixed upon a restart of the smtpQ service.
The problem begins when the smtpQ suddenly starts using the Local Machine instead of the specified server as the connection target. The email message is specified using the correct server name, but the smtpQ process is trying to connect to the local machine instead:


Need new SMTP connection checkForExistingConnection: Elapsed time: 0 millisec SMTP_Connect: Connecting to SMTP server 127.0.0.1:25


Is there a known scenario that causes smtpQ to suddenly start using the local machine address instead of the specified server?


Answer

Thanks for the quick responses! I'll ask the user to send me the eml file. Perhaps the problem was between the chair and the keyboard...


Answer

Today, another user reported a blocked smtpQ scenario. One "bad" eml caused ALL following eml files to be stuck in the Outgoing folder. This is NOT an issue of retries -- the blocking persisted for much longer. Sending resumed immediately when the bad eml file was removed from the Outgoing folder.

Question: is there a known mechanism whereby the smtpQ service would get blocked because of one bad eml file. For example, does smtpQ use the smtp server and port specified in EACH eml file, or does it reuse the info from the first eml file to "save time"? Is there a way to force the smtpQ service to NOT reuse connection information from the first eml file?


Answer

I received further clarification from the user:

  1. The bad email did not remain in the Outgoing folder, it got moved to the Undeliverable folder. But the rest of the eml files remained in the Outgoing folder (MUCH longer than the retry cycles).

  2. The bad eml file was targeting a different smtp server (compared to all the other eml files). This leads me to believe that smtpQ hangs when one email fails, and the rest of the emails target a different smtp server.

  3. Emailing resumed when the eml files in the outgoing folder were copied to a different folder, and then moved back to the Outgoing folder.


Answer

This old problem still persists. It boils down to the following scenario: after a WSAEWOULDBLOCK failure, the smtpQ hangs indefinitely. This is not due to retries because the service remains hung until it is restarted. A restart immediately solves the issue. Is there a known explanation/fix?