[NBLUG/talk] Procmail experience for spam

E Frank Ball frankb at efball.com
Mon Mar 29 21:17:00 PST 2004


On Mon, Mar 29, 2004 at 06:08:11PM -0800, Robert Hayes wrote:
} Anyone have any procmail experience for spam control?

Yes, plenty.


} I'm getting my mail from two POPservers to Kmail. No internal network traffic.
} 
} Will procmail be the answer I'm looking for? 
} Or is there something better I should go after?

Procmail is part of the solution.  By itself it can screen out a few
really bad domain names, non-english character sets, etc.  But that
really won't make much of a dent in the overall spam volume.  It looks
like my procmail pre-filtering is getting about 13% (26 messages/day) of
the spam on one account.  This filters out a good chuck of my mostly
unreadable Asian based spam:

:0
* ^From.*hanmail.net
/dev/null

:0
* ^From.*korea.com
/dev/null

:0
* ^From.*@yahoo.com.cn
/dev/null

:0
* ^From.*@yahoo.co.kr
/dev/null

:0
* ^From.*@daum.net
/dev/null

:0
* ^Content-Type.*charset.*ks_c_5601-1987
/dev/null

:0
* ^Content-Type.*charset.*euc-kr
/dev/null

:0
* ^Content-Type.*charset.*GB2312
/dev/null


Eric, Mike, and I did an NBLUG presentation on fighting spam
a while back.  Our notes are here:
http://www.nblug.org/presentations/spam/
(quick before the server goes down).

spamassassin is the most popular tool.  It tags spam as being spam by
looking for characteristics in the email and setting a score for each
peice of mail.  procmail then sorts based on the spamassassin scores
and the mail can either be delivered, deleted, or put in a purgatory
file for later review.

Recently I switched from spamassassin to crm114.
http://crm114.sourceforge.net/

I've found it to be noticably more accurate, and it was fairly easy to
setup (the documentation could be better).  They author claimes about
10x accuracy improvement (after training) and 1/4 the server load
compared to spamassassin.  One big difference is crm114 starts life
being very very dumb, and you have to train it whenever it
mis-classifies anything.  It learns fast, and in a few weeks it will be
very accurate.

crm114 or spamassassin get called from procmail, then procmail sorts
based on the results.  man procmailex has lots of .procmailrc examples.

Sonic customers are familiar with what they call "graymail".  Suspected
spam gets put in "graymail" and we get a daily report with a list of the
from and subject lines for each mail.  If we see something we want we
can go to the graymail mailbox and get it, otherwise it is auto-deleted
after a week.

I wrote a quick and dirty graymail system for my home mail server.  It
uses procmail and cron and a script I wrote.  I put a copy here:
http://www.frankb.us/stuff/crm114/process-spam

It can work with spamassassin, crm114, or another filter.  It's got a
few bugs in it, but it's functional.  Worst case is the reports get at
little mixed up.  Change frankb to your login, make sure the directories
are setup is the same.  I use a ~/.rm directory and move things there
instead of really deleting them (good for when my scripts don't work
right).

I also have some scripts I wrote for training crm114.  I save the mail
to a file, then execute the scripts.  I need to setup a mutt macro,
which means I have to learn how to write mutt macroas.  Let me know if
anybody wants to see them.  They would have to be customized for each
user, but they are a start.

-- 

   E Frank Ball                frankb at efball.com



More information about the talk mailing list