Groups | Blog | Home
all groups > iis smtp nntp > august 2007 >

iis smtp nntp : 5xxsink problem


Lance Robaldo
8/20/2007 12:04:03 PM
I've installed 5xxsink on our IIS server and it works great....except....

Every so often (sometimes minutes, sometimes hours) IIS just shuts down.

I have both the prescan.txt and the rcptlist.txt in the appropriate
directories. The prescan.txt contains 1 entry with our domain name and the
rcptlist.txt contains about 40 entries with our user's e-mail addresses.

When I unbind it from IIS, I have no problems except thousands of NDR's. But
when I bind it to IIS, it randomly shuts down IIS.

Any ideas?

Thanks,
Lance Robaldo
Lance Robaldo
8/20/2007 6:44:04 PM
As per the text file, there is only 1 extra CR at the end of each file.
I'll check the memory on InetInfo.

Not sure how to isolate the last rcpt.

We have the SMTP acting as a relay to our Merak mail server which is
handling all the companies mail. This gives antivirus a chance to check them
out, and it allows our Merak server to sit on the safe(r) side of the
firewall.

The goal is to only forward mail for OUR domain to the Merak server, and
just kill any other mail trying to relay through us.

Thanks,
Lance Robaldo

[quoted text, click to view]
Sanford Whiteman
8/20/2007 8:50:31 PM
[quoted text, click to view]

Minutes? That certainly sounds unrelated to load (not that I have any
outstanding bugs related to load).

[quoted text, click to view]

Can you make sure you don't have extra CRs at the end of these files
(thus making them much longer than they seem)?

How does memory look for INETINFO? Do you see constant creep?

Can you isolate the last processed RCPTs before each crash?

Lance Robaldo
8/20/2007 9:02:01 PM
[quoted text, click to view]

No, not another event sink. We're running Norton Antivirus Enterprise on
the IIS machine (and all our machines for that matter). It cleans most mail
messages when they arrive and before they get passed along to the other
server.

I may not be able to take the time 2morrow to bind the sink, but as soon as
I can, I'll post the info you requested. I'd really like to get it working
without bringing down IIS constantly.



Sanford Whiteman
8/20/2007 9:57:31 PM
[quoted text, click to view]

Pls do.

Also report any event log errors.

[quoted text, click to view]

Your IIS logs.

[quoted text, click to view]

What antivirus, out of curiousity? Another event sink?

[quoted text, click to view]

Yep, that's a good plan.

Sanford Whiteman
8/21/2007 12:16:48 AM
[quoted text, click to view]

Yikes: do you mean you have a general-purpose on-access scanner (not,
for counterexample, in-line Symantec Mail Security for SMTP) scanning
the mail spool? If that's what you're talking about, that's really a
worst practice.

(And is it also scanning the PRESCAN.TXT and RCPTLIST.TXT on every
access?)

Lance Robaldo
8/21/2007 10:06:03 AM

[quoted text, click to view]

Yes, it's slowly creeping up. It started at 17,536K, currently 3 hours
later, it's at 21,432K

[quoted text, click to view]

Just in case that is causing the problem, I've turned off virus scanning for
all files in question.

[quoted text, click to view]

No, it never was scanning text files on access, only on update. But as
stated above, I've turned it off for ALL of the files in question.

It hasn't crashed "so far" today, but if/when it does I'll get the last
e-mail received info from the log and post it here. I'm more worried about
the memory creeping right now.

Thanks,
Lance.



Lance Robaldo
8/21/2007 2:10:01 PM
OK, it crashed again after about 6 hours running this time.

The error log contains only these entries:

The IIS Admin Service service terminated unexpectedly. It has done this 1
time(s). The following corrective action will be taken in 60000
milliseconds: No action.

The FTP Publishing Service service terminated unexpectedly. It has done
this 1 time(s). The following corrective action will be taken in 60000
milliseconds: No action.

The Network News Transport Protocol (NNTP) service terminated unexpectedly.
It has done this 1 time(s). The following corrective action will be taken in
0 milliseconds: No action.

The Simple Mail Transport Protocol (SMTP) service terminated unexpectedly.
It has done this 1 time(s). The following corrective action will be taken in
60000 milliseconds: No action.

The World Wide Web Publishing Service service terminated unexpectedly. It
has done this 1 time(s). The following corrective action will be taken in
60000 milliseconds: No action.

The last connection in the IIS log before the crash is as follows:

2007-08-21 20:46:11 189.4.137.194 bd0489c2.sts.virtua.com.br WLT-I EHLO -
+bd0489c2.sts.virtua.com.br 204 31 0 SMTP -
2007-08-21 20:46:11 189.4.137.194 bd0489c2.sts.virtua.com.br WLT-I EHLO -
+bd0489c2.sts.virtua.com.br 204 31 0 SMTP -
2007-08-21 20:46:11 189.4.137.194 bd0489c2.sts.virtua.com.br WLT-I MAIL -
+FROM:<consultk2081@buffalo.edu> 49 36 0 SMTP -
2007-08-21 20:46:11 189.4.137.194 bd0489c2.sts.virtua.com.br WLT-I MAIL -
+FROM:<undercoating@buffalo.edu> 49 36 0 SMTP -
2007-08-21 20:46:11 189.4.137.194 bd0489c2.sts.virtua.com.br WLT-I RCPT -
+TO:+<oates@wltsoftware.com> 29 32 0 SMTP -
2007-08-21 20:46:11 189.4.137.194 bd0489c2.sts.virtua.com.br WLT-I RCPT -
+TO:+<ledbetter@wltsoftware.com> 29 36 0 SMTP -
2007-08-21 20:46:11 189.4.137.194 bd0489c2.sts.virtua.com.br WLT-I RCPT -
+TO:+<lucas@wltsoftware.com> 34 32 0 SMTP -
2007-08-21 20:46:11 189.4.137.194 bd0489c2.sts.virtua.com.br WLT-I RCPT -
+TO:+<lanier@wltsoftware.com> 29 33 0 SMTP -


I'd appreciate ANY assistance you could give here. Otherwise I can't use
5xxsink. And I would REALLY like to filter out all the attempted relays
before forwarding them to my Merak mail server.

Lance.


Lance Robaldo
8/21/2007 9:20:00 PM
[quoted text, click to view]


[quoted text, click to view]

No, inetinfo pretty much doesn't grow at all when 5xxsink is unbound. We
generally run for months between reboots and those aren't generally due to
mail issues.

[quoted text, click to view]

The SMTP logs are huge, but a quick count looks like looks like we get
somewhere between 200 and 600 attempted RCPT commands every 5 minutes. Of
those, generally under 20 are successful (per 5 minute block).

From time to time (daily), it seems to peak at well over 5000 attempts
during a single 5 minute period. Of those, virtually none are successful.
Fortunately, that usually only continues for 5-10 minutes, then traffic dies
down to 600 and under per 5 minute block.

[quoted text, click to view]

I'm sure it should be able to handle the load, but SOMETHING is going wrong.

[quoted text, click to view]

I agree. Not my choice. Boss makes the rules. To make it worse, we own a
copy of Symantec for SMTP. One of our other IT guys had trouble installing
it so the boss isn't keen on spending any time getting it to work now....

Sanford Whiteman
8/21/2007 11:40:20 PM
[quoted text, click to view]

That rate of increase seems extremely high for only 3 hours. You don't
see that rate of increase with 5xxSink unbound?

Pls let me know how many attempted RCPT commands you get, on average,
per hour, how many successful RCPT commands there are out of those
attempts, and how many inbound connections are responsible for those
commands.

FYI, 5xxSink can handle hundreds of concurrent connections and many
thousands of attempted RCPTs per connection, even on an oldish Athlon
workstation, without even blinking. Naturally, the more extant RCPTs
there are on the same server, the more spool I/O traffic it will have
to compete with; similarly, the more extant RCPTs there are, the more
disk and CPU resources will be expended in performing content
scanning, plus more resources for remote delivery of legit messages.
So it is hard to give a firm metric for its capacity. But I have a
feeling that its capacity is way higher than your current load, as
requested above.

[quoted text, click to view]

Well, technically, it probably is getting notified whenever the file
is read, even if it doesn't scan it every time.

I still -- perhaps off to the side here -- want to reiterate that
using a general-purpose filesystem scanner for a mail spool isn't
advisable.

Lance Robaldo
8/22/2007 9:50:02 AM
Sandy,

I just did a physical count of all incomming RCPT commands. Between 6:00am
and 7:00am we had 9622 incomming RCPT commands.

There were 3790 inbound connections made.

There were only approx. 100 successful RCPT's.

Again, any help is tremendously appreciated. We're being deluged by relay
attempts and I'd love to kill some of this traffic.

Thanks,
Lance Robaldo






[quoted text, click to view]
Dave Onex
10/2/2007 5:28:08 PM
I'm seeing the same errors on my Exchange 2000 server. I'm still looking
into it but it sounds like we're having the same issue although in my case
the traffic level is almost negligible. I'll post back to this thread when I
see the issue again.


[quoted text, click to view]

Lance Robaldo
10/3/2007 7:44:05 AM
Thanks, I never did resolve the problem. If you find a solution please keep
me informed.

Lance

[quoted text, click to view]
Dave Onex
10/3/2007 10:24:04 AM
Understood - it's definately an odd issue.

On a separate dedicated Windows 2000 AdvSrvr/IIS 5.0 relay machine what I
was seeing was every few days the IIS service would lock up and mail would
not be transferred. On that machine I tried un-installing IIS and
re-installing it but I had the same issue. This seems more in keeping with
Lance's original post.

Rather then try and isolate the issue on that particular machine I just
changed my email structure around and went back to using another server,
this one a dedicated Windows 2000 AdvSrvr machine that only runs Exchange
2000 EE.

What I'm seeing is something that I suspect is a variant of the same issue.
With 5xxsink bound to a second VS (strictly for sending mail) I get errors
every few days as follows;

Event ID:7031
The IIS Admin Service service terminated unexpectedly. It has done this 1
time(s). The following corrective action will be taken in 1 milliseconds:
Run the configured recovery program.

Event ID: 7031
The Microsoft Exchange IMAP4 service terminated unexpectedly. It has done
this 1 time(s). The following corrective action will be taken in 0
milliseconds: No action.

Event ID: 7031
The Network News Transport Protocol (NNTP) service terminated unexpectedly.
It has done this 1 time(s). The following corrective action will be taken in
0 milliseconds: No action.

Event ID: 7031
The Microsoft Exchange POP3 service terminated unexpectedly. It has done
this 1 time(s). The following corrective action will be taken in 0
milliseconds: No action.

Event ID: 7031
The Microsoft Exchange Routing Engine service terminated unexpectedly. It
has done this 1 time(s). The following corrective action will be taken in 0
milliseconds: No action.

Event ID: 7031
The Simple Mail Transport Protocol (SMTP) service terminated unexpectedly.
It has done this 1 time(s). The following corrective action will be taken in
0 milliseconds: No action.

Event ID: 7031
The World Wide Web Publishing Service service terminated unexpectedly. It
has done this 1 time(s). The following corrective action will be taken in 0
milliseconds: No action.

The good news is that unlike the dedicated IIS 5 machine the Exchange server
quickly re-starts those services (it looks like within about 10 seconds
altogether) and mail works again for several days.

Unbinding 5xxsink causes the issue to stop appearing. What I've also found
is that occassionally spam email will actually make it through 5xxsink -
even though the destination email address does not exist and is not in
either the email address list or the domain list.

However, 99.99% of every other attempt to send mail to the same non-existant
email address is 550'd and the connection is dropped. So for some reason,
and I can't see why (yet), the odd time an email makes it through even
though that address get's 550'd every other time.

The time period that this occurs does not co-incide with when those services
are re-starting. In fact, I can't see any pattern at all when that odd piece
of mail gets through. In my case the mail server is radically
under-utilized.

In my travels I did come across this
(http://www.eventid.net/display.asp?eventid=7031&eventno=465&source=Service%
20Control%20Manager&phase=1);

Yiannis Papadopoulos (Last update 6/30/2003):
- Service: IISAdmin, IMAP4, NNTP, POP3, MSE Routing Engine, SMTP, WWW.
Appeared after installing an SMTP event sink for MS Exchange 2000.
Resolution: unregister the sink. If this fails unregister the DLL of the
sink and restart Windows. See M313404 on how to register an SMTP event sink.

and it got me thinking that the only constant variable (that I can think of)
is 5xxsink was installed on both servers that exhibited this issue.

In my case, I think I'm probably going to leave it alone on the Exchange
machine due to the fact that it quickly re-starts the services - and 5xxsink
is highly valued - even if there's a glitch somewhere the benefits far
outweigh the issue.

In hindsight though, I probably could have configured those services to
automatically re-start on the original relay machine and kept it in service
but I thought there might be something up with it.

I'll keep looking into it to see if there's something else I can come up
with but I suspect the issue lies deeper then my abilities to resolve it at
this point.


[quoted text, click to view]

Sanford Whiteman
10/3/2007 10:55:05 AM
[quoted text, click to view]

Yeah, guys, I wish I could go further with you, but I don't know what
combo of settings could be causing this without talking a deep (and
destructive) look at your servers. I can't replicate this far-reaching
crash. Sorry.

--Sandy


------------------------------------
Sanford Whiteman, Chief Technologist
Broadleaf Systems, a division of
Cypress Integrated Systems, Inc.
aasma
11/20/2007 6:32:55 PM
asdfgafhg
[quoted text, click to view]


begin 666 hi guys.txt
)87-D9F=A9FAG
`
end
AddThis Social Bookmark Button