My spam filter of choice has been bogofilter for many a year now. For the mail I receive it got to be very accurate quickly and it has remained so ever since. It is one of the Bayesian variety of spam filters and requires “training” to keep it properly classifying email.
I use an IMAP server for working with my mail so integrating bogofilter with the server is less than ideal, which would be to use a keystroke and immediately reclassify the mail. Instead, I’ve assigned a couple of training folders that I then farm out to a script run as a cron job. Specifically, for misclassified spam (i.e. mail that’s actually good but was misclassified as spam) I created a spam2mail
folder and for misclassified good mail (i.e. mail that’s actually spam but is classified as good) I use the Junk
folder. The script, using IMAP, interrogates the mail folders, retrains bogofilter on the mail, and then places the mail in the appropriate final destination, either my spam
folder or my INBOX
.
Originally, I wrote the script in question using perl and IMAPtalk. Since I wrote an IMAP library in lua, I figured it appropriate to rewrite the script in lua using my library.
After the break is the code.
!/usr/bin/env lua
local imaplib = require("imap4")
local host, user, passwd = arg[1], arg[2], arg[3]
local bogo_useropts = arg[4] or ' '
if bogo_useropts then bogo_useropts = ' '..bogo_useropts end
function chk_result(r, imap)
if r:getTaggedResult() ~= 'OK' then
imap:shutdown()
error("IMAP command failed")
end
return r
end
function move_messages(imap, bogo_opts, user, src_mb, dest_mb)
local r = chk_result(imap:select(src_mb), imap)
-- untagged EXISTS will have message count
local msg_cnt = r:getUntaggedContent('EXISTS')[1]
if msg_cnt == '0' then return end
-- open bogofilter for writing
local fh = io.popen('bogofilter'..bogo_opts, 'w')
if not fh then error("Could not open bogofilter") end
local path = '/home/'..user..'/Maildir/.'..src_mb
fh:write(path)
fh:close()
-- moving messages around in imap is a pain
-- it consists of moving, then setting flags, then expunging
-- the close gets us out of the selected state so a new
-- folder can be selected
chk_result(imap:copy('1:*', dest_mb), imap)
chk_result(imap:store('1:*', '+FLAGS', [[\Deleted \Seen]]), imap)
chk_result(imap:expunge(), imap)
chk_result(imap:close(), imap)
end
local imap = imaplib.IMAP4:new(host)
local r = chk_result(imap:login(user, passwd), imap)
move_messages(imap, bogo_useropts..' -Snb', user, 'spam2mail', 'INBOX')
move_messages(imap, bogo_useropts..' -Nsb', user, 'Junk', 'spam')
Pretty simple. Two helper functions do the lion’s share of the work. Obviously, move_messages
is the big one. Basically, it checks if mail is in a src_mb
and proceeds accordingly, reclassifying mail by piping each mail to bogofilter or simply returning if not. As noted in the comment, moving mail around with IMAP is a bit of a pain, requiring 3 steps to copy mail into a new mailbox and then deleting it from the original. But I’ll also note that those commands work on all the messages in a folder (as opposed to having to repeat the commands for each message in the mailbox), so it’s not all bad.