Monday, October 30, 2006 - a decoy mail server

I've written a decoy mail server in Perl. It is a fully RFC 2821-compliant mail server. Except it doesn't do anything.

Well it does something. Basically this is a decoy to catch all the spammers who are intentionally talking to the wrong MX server first.

It works like this. is attached to an inetd script.
spammer calls on port 25. says 220 Hows it goin?
spammer says something says 250 yeah sure whatever
repeat until spammer says "data" (i.e. the thing right before they send the message) says 451 try again later.
spammer does some more stuff says 250 yeah sure whatever
spammer gives up and says quit says 221 don't be a stranger and then hangs up.

There, completely RFC 2821 compliant, yet doesn't do anything. Responding 451 to the "data" command means that this mail server will not accept this email message and the sender should try again later. When sendmail tries to send a message to a decoy mail server, it will timeout in the typical sendmail style (after 5 days).

To use the decoy, set it up using an inetd. Then, advertise it via your DNS MX records. For example on "" would look like:
> dig mx


;; ANSWER SECTION: 86400 IN MX 20 86400 IN MX 20 86400 IN MX 20 86400 IN MX 40 86400 IN MX 99


So, unless really bad circumstances are happening, you'll never talk to "". Even when bad circumstances are happening, a proper email server will just try back later.

On domains we've installed this, we've seen lots of spammers talk to our decoy. And I'm finding that after spammers chat with the decoy, they tend not to chat with my other mail servers.


Vmware vs. Debian Testing

So I updated Debian testing yesterday and after that starting vmware just exits without actually doing anything. Since vmware is actually run via a wrapper program, it was not only eating the output (grr) but it was making testing it a bit harder.

Ultimatly I picked up a solution from the Ubuntu forums that resolved the problem. Basically libhal and libdbus got upgraded and VMware Workstation 5.5.2 build-29772 couldn't deal with it.

Yay... now back to vmware goodness.

At approximately 5:15 PM last night the cluster node running our main mail server crashed and rebooted. No big deal I thought, if it doesn't want to play nice I'll use other cluster nodes.

Then the /mail partition won't mount because it needs fsck. No big deal since its logging... wait... it needs fsck because its really confused about the logs... fine. This is a huge filesystem with 260ish gig of stuff on it. In maildir format. Lots of tiny little files.

Anyway, I sigh and think: well it'll be about 30 mins to fsck that baddie and all will be right.

Negative. The first fsck took over an hour and a half.

And it failed.

I go to dinner and watch the output on my Treo. (which crashed several times, but thankfully screen was there to save the day).

Each time I ran fsck it got a bit faster (down to about 30 mins to fsck it, but it would always exit saying:

# fsck -y -v /dev/rdsk/emcpower4a
** /dev/rdsk/emcpower4a
** Last Mounted on /san/mail/mail
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3a - Check Connectivity
** Phase 3b - Verify Shadows/ACLs
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cylinder Groups
CG 1164: BAD CG MAGIC NUMBER (0x0 should be 0x90255)
WRONG CG NUMBER (0 should be 1164)
IMPOSSIBLE BLOCK ALLOCATION ROTOR POSITION (-2147483648 should be at least 0 and less than 49152)
IMPOSSIBLE FRAGMENT ALLOCATION ROTOR POSITION (-17039106 should be at least 0 and less than 49152)
IMPOSSIBLE INODE ALLOCATION ROTOR POSITION (16711420 should be at least 0 and less than 11648)
INCORRECT FREE FRAGMENT MAP OFFSET (16530432 should be 1704)
END OF HEADER POSITION INCORRECT (255819520 should be 7848)

Irreparable cylinder group header problem. Program terminated.

Fine. Ask Google.

Only four responses. [note: this is where panic starts to set in] And to make things better, two of those are the opensolaris source code. The other two is the same post of someone asking about this. [note: panic in full swing now]

This is not a good sign. This means I'm out in left field here.

So, after a few more hours of banging my head on the thirty minute long fsck's, I start the process to ufsdump and ufsrestore the partition. One problem becomes immediately apparent... Its gonna take 11 hours just to ufsdump the thing. And one could figure about that long to ufsrestore it. TWENTY TWO HOURS estimated. Whoa... time to look more into the problem.

Then something Matt had told me earlier in the evening (like around 10ish)... something about the new version of fsck adding a -v option... heeey.... new version of fsck, eh? Upon prodding around in the Sunsolve stuff I see that the 120986-06 was released on August 18... right before I patched the machines. I see other indecations of someone having something like my problem in Sunsolve (but not nearly to the degree I do...), but at any rate Sunsolve has no real solutions.

So, I took the only step I could think of. I took it out:

# patchrm 120986-06

And, viola!

# fsck /dev/rdsk/emcpower4a

** /dev/rdsk/emcpower4a
** Last Mounted on /san/mail/mail
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups

6123615 files, 232133573 used, 457986504 free (2902704 frags, 56885475 blocks, 0.4% fragmentation)


I've come to the strong conclusion that 120986-06 (mkfs newfs ufs utilities) is [insert not nice thing here].

So, at least without that the [pejoritive] fsck finished. Hopefully it did the right thing with Cylinder Group 1164. Like showed it a whole bunch of zeros. At least the fs is mounted and all the is flowing in like normal.

Which of course was blowing out or spamassassin server (when it rains it pours). I've added more procmailrc locking (now only running once per user) and hopefully life will be happier now. Okay yeah, until procmailrcs time out.

As soon as this is all done then I'm off to bed. Thats what I said an hour and a half ago. Mental note. If you get
svc:/system/cron:default: Could not interpret group property.

Check to see that root is uid 0 gid 0. *Sigh*

Music Goodness

After upgrading to Debian testing on my laptop, I noticed that mpd stopped playing through esd. The fix was easy -- add the following to ~/.mpdconf

ao_driver "esd"

This pity factoid brought to you by the numbers 4 and 2.

Have you wondered, item number #2353.523:

How exactly does one convince a Solaris machine to reread its EMC SAN configs using Emulex and Powerpath without rebooting?

  1. Edit /kernel/drv/sd.conf and add the lun. We have two switches so each switch is bound to a different SCSI id. So you just add another lun.
  2. Issue a update_drv -f sd -- per Emulex's documentation
  3. Run devfsadm
  4. Run /etc/powercf -q -- note that the -q means "quiet" mode, but powercf won't seem to run without it (I guess its too polite to be loud in the server room)
  5. Run /etc/powermt config
  6. Run /etc/powermt save

Now, remember boys and girls, these are just my musings and if you're foolish enough to try to use this as authoratitive information you really are on your own.

Do not taunt happy fun ball.

