Basic Spam Filtering
By Asim Krishna Prasad
Posted on 26/01/15
Tag :
Log
It's 1:46AM and my second sleepless-working night. After a lot of head-banging and hair-pulling I succeeded in implementing a basic Spam-Filter.
First I modified my Python sript ( DIGM ) and downloaded some spam-mails and some non-spam-mails. Coded a C++ program which can extract words from those mails and created two dictionaries, one for Spam-words and other for Non-Spam-words. So far so good.
Then I created a shell script which will be handling all the operations. Downloaded a bunch of random mails and started coding my shell script word-by-word. The shell script first takes all the .txt files in the folder and then iterates over them, passing them as inputs to a C++ executable which calculates the number of Spam-words and number of Non-Spam words in the file and returns it to the shell script. According to the value returned by the C++ program, the shell script then copies the mail in "SPAM_MAILS" or "NONSPAM_MAILS" folder...
First Stage completed.. Moving on..
COMMENTS :