Care and feeding of SMTP honeypots

This post appeared originally in our sysadvent series and has been moved here following the discontinuation of the sysadvent microsite

In parallel with an SSH/telnet honeypot, I’m also running an SMTP honeypot using INetSim. The SMTP honeypot is only one of many functions of INetSim; this article will cover the SMTP component only.

The SMTP part of INetSim has been configured with the following settings in inetsim.conf:

start_service smtp

smtp_bind_port            25
smtp_fqdn_hostname        darkstar.example.org
smtp_banner               "SMTP Mailer ready."
smtp_helo_required        yes
smtp_extended_smtp        yes
smtp_auth_reversibleonly  yes
smtp_auth_required        yes
smtp_service_extension    VRFY
smtp_service_extension    EXPN
smtp_service_extension    HELP
smtp_service_extension    8BITMIME
smtp_service_extension    SIZE 102400000
smtp_service_extension    ENHANCEDSTATUSCODES
smtp_service_extension    AUTH PLAIN LOGIN ANONYMOUS CRAM-MD5 CRAM-SHA1
smtp_service_extension    DSN
smtp_service_extension    ETRN

With this configuration, the honeypot will require SMTP authentication before accepting email for delivery. (Since this is a honeypot, the “delivery” involves /dev/null - but only after everything is properly logged.) The honeypot also requires a proper HELO introduction. Even spammers need to show some manners!

The probes

In this honeypot, I see two types of email. The first type is the probes. In intrusion terminology, this is the reconnaissance phase. Probes will test various sets of credentials against the server, and if (when) a username/password combination is allowed, that info will often be included in a mail sent via the server.

After having found a valid username/login combo on a mail server, the abuser (or rather, his script) is only halfway there. Actually submitting the email and then checking the recipient mail account if the mail made the whole round is a critical check for wannabe spammers. After all, they could have stumbled upon a misconfigured (for their purpose) mail server, or - god forbid - a honeypot.

From a security point of view, this part is very valuable. The recipient address for the probes is often a freemailer account, but all the same this is the address that collects information about confirmed exploitable mail servers. Whenever such a probe is detected on my honeypots, I submit the indicators to threat intelligence platforms. Below are some examples from recently registered probes, where my honeypot’s IP has been replaced by a private IP address:

From: 3W1v3y61@amexx.com
To: nervideotel@libero.it:
Subject: 192.168.0.42,root,pulamea

From: me@newhouse.com
To: xderia@outlook.com:
Subject: PASS-3 192.168.0.42,anyuser,anypass

From: 2b8zr1d8@re.imap.outlook.net
To: floricica2011@hotmail.com:
Subject: 192.168.0.42,sales@nextgentel.com,password

From: test@test.com
To: llarry21999@yahoo.com:
Subject: test smtp 192.168.0.42-holly@example.org-holly

The spam

Then there’s the matter of the payload. Some spammers won’t try abusing the mail relay after probes never reached their confirmation address, but others don’t care much about that. The most busy spammer managed to submit spam to almost 300 000 recipients in 24 hours, all of which were successfully submitted to /dev/null.

Using Filebeat, key attributes from every spam mail are submitted to an Elastic stack. This gives a very nice aggregation and overview over the source addresses, subjects, IP addresses, and other useful parameters from current spam/phishing campaigns. While the sender’s email address is probably either fake or abused, we are adding these confirmed spam sources to blacklists and spam filters on both our own and our customers’ mail servers.

Only one active spammer, and two probes..

INetSim saves a copy of all the mail it accepts for, eh, submission, to an mbox formatted file. My Filebeat configuration for extracting the headers from this file is as follows:

# /etc/filebeat/filebeat.yml
filebeat:
  prospectors:
    - paths:
        - "/var/lib/inetsim/smtp/smtp.mbox"
      type: log
      scan_frequency: 10s
      tail_files: true
      tags: ["inetsim-smtp"]
      include_lines: ['^From [\w.+=:-]+(@[0-9A-Za-z][0-9A-Za-z-]{0,62}(?:[\.](?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*)? ']
      multiline.pattern: '^$'
      multiline.negate: true
      multiline.match: after

The Filebeat configuration does two things: The multi-line configuration concatenates every adjacent non-blank line into one line, and the include_lines setting evaluates the content of the line to decide whether or not to send it to ELK. Since the mail headers section in the mbox format always starts with From: and an email address, the regular expression in include_lines makes sure only the line of headers are shipped.

On the recipient side, Logstash is configured to parse the headers as follows:

filter {
  grok {
    # Note two spaces in From line
    match => { "message" => [
      "^From %{EMAILADDRESS:smtp.mail_from}\s+%{MBOXTIMESTAMP:mboxtimestamp}\n%{GREEDYDATA:smtpheaders}",
      "^From %{USERNAME:smtp.mail_from}\s+%{MBOXTIMESTAMP:mboxtimestamp}\n%{GREEDYDATA:smtpheaders}",
      "^From %{GREEDYDATA:smtp.mail_from}\s+%{MBOXTIMESTAMP:mboxtimestamp}\n%{GREEDYDATA:smtpheaders}"
      ]
    }
    patterns_dir   => "/etc/logstash/patterns.d"
    patterns_files_glob => "inetsim-smtp"
    remove_field => [ "message" ]
  }
  date {
    match => [ "mboxtimestamp", "EEE MMM dd HH:mm:ss yyyy", "EEE MMM  d HH:mm:ss yyyy" ]
  }

  mutate {
    # Replace indentation in Received: header and possibly others
    gsub => [ "smtpheaders", "\n(\t|\s+)", " " ]
  }
  kv {
    source => "smtpheaders"
    value_split => ':'
    field_split => '\n'
    target => "smtp"
    transform_key => "lowercase"
    trim_value => " "
  }
}

The patterns file /etc/logstash/patterns.d/inetsim-smtp only includes one line:

MBOXTIMESTAMP %{DAY} %{MONTH} \s?%{MONTHDAY} %{TIME} %{YEAR}

Note: This configuration will create fields in the index based on the SMTP headers being read. If the spammer uses non-standard headers, these will be automatically created.

The greater good

After having extracted all the useful information from the spam mail’s content, one thing remains: submitting to well-known spam registers. I do this by piping the file to SpamAssassin in training mode, which can read the mbox format:

cat /var/lib/inetsim/smtp/smtp.mbox | su - amavis -c 'spamassassin -r --mbox'

This will train the local SpamAssassin installation (in this case as used by Amavis in a mail scanning setup), as well as submit to any configured remote checks (Razor2, Pyzor, Spamcop etc).

Update

2025-09-03: Format

Care and feeding of SMTP honeypots

December 12, 2017

The probes

The spam

The greater good

Bjørn Ruberg

EL9's grubby sorting

From a Luddite to a Vibe-Coder

Ontology: A Guide to Understanding and Structuring Data