How spam filters save us from intrusive ads (featuring @phenom as author)

in #technology8 years ago

Attention: This post has been written by @phenom

All of you received spam on your mailboxes, but it was successfully blocked and did not reach your eyes. But do you know how these spam filters work, how they determine that the email is a spam, how they protect you from it and what spammers do to circumvent all these measures? We will discuss it in my today ‘s article.

enter image description here


Let’s understand what a word spam means.

  • Spam - is an advertisement that is sent out against the will of the recipient.

And now easy to understand, what is a spam filter.

  • Spam filter - is the software that automatically detect spam, which is designed to be used by users or by servers and allows filtering out normal conversation from spam mailings.

Almost all spam filters use two main methods of filtering:

Analysis of the content of the letter

In this method, a statistical analysis of the content is used. To use this method it is needed to "train" filters, I mean that letters are manually sorted to identify the statistical characteristics of normal email and spam. The method works very well when sorting messages in which advertising information is provided in plain text or HTML. After training on a sufficiently large amount of emails, it becomes possible to cut up to 95-97% of the spam.

enter image description here

However, there are ways to circumvent these filters. To do this, spammers write random text and place advertising in the form of an image. The presence of random text cheats filter and does not allow to train it. Many email services use the button "Report Spam" to train the filter. Information about what messages are considered as spam are used to filter these messages and for the training of filters in the future. Gmail and Facebook use such a system.

Analysis of the sender

There are many blacklists with IP-addresses of the computers that are sending spam. To know if the IP is blacklisted there is made a request through DNS. Therefore, these lists are called DNSBL (DNS Black List).

enter image description here

This method is currently not very effective, as spammers find new servers for their goals faster than filters place them in the blacklist. In addition, several computers that send spam can compromise the entire mail domain or subnet, and thousands of law-abiding people would be indefinitely unable to send an e-mail to servers, using such blacklist. Also, irresponsible and incorrect use of blacklists by administrators leads to blockage of a large number of innocent users.

Greylisting

Greylisting is based on the analysis of the "behavior" of software designed to send spam messages and comparing it with the normal behavior of different mail servers. Spammer programs couldn’t re-send a letter after the appearance of administration errors. The simplest version of the software work based on the gray list works like this - all previously unknown SMTP-servers are considered as "gray."

enter image description here

Mails from such servers are not accepted, but not completely rejected - it returns with a temporary error code. If the sending server repeats its attempt after a certain period, the server transfers it to the white list. Therefore, normal letters are not lost but only delayed. This method now allows filtering out up to 90% of spam with virtually no risk of losing normal emails. However, it is also not perfect.

There are also many other methods:

  • refusal to accept letters with the wrong return address (letter from non-existent domain);
  • analysis of message headers;
  • systems of determining the characteristics of mass posts etc .;

But this methods are used so rarely that there is no need to talk more about them.

And in the end of my article, I would like to tell you how to defend your blog from spam.

Modern methods of fighting against spam use different types of captcha.

Captcha methods:

  • Captcha-picture - provided by universal service reCAPTCHA and makes us write some words or numbers from an object on the picture.

  • Text captcha - a captcha that offers to write an answer to the proposed question, to write numbers or letters given above or to solve equations.

  • Interactive captcha- it is a captcha that offers the user to interact with images and objects to determine that he is not a robot.


As you now know, even such corporations as Google can’t create software that would allow filtering spam with 100% accuracy and without losing any real messages that mistaken for spam, so spam will continue annoying us, but in small quantities.

There are still a lot of ways to circumvent spam filters and spammers do this really well. So if you have your blog - use captcha filters which can help you to cut up to 90% of spam and if spam comes to your email then you need to flag it as spam, so that developers could figure out how this email came through filters and will teach their filter to fight spam of this type.

Image credit: 1, 2, 3, 4, 5, 6

Follow me if you're a GEEK like me or want to learn more about IT/Technological/Math topics

Alex aka @phenom


Attention: This post has been written by @phenom

@knozaki2015 features authors and artist to promote them and a diversity of content. https://steemit.chat/channel/academy (if you want to get in touch)

The author will receive 100% of the STEEM Dollars from this post

Don't just follow me, follow the author as well, if you like their post @phenom

Sort:  

Very nice work on this article. People have no idea what they can receive in their mailbox. Gives better understanding not to open unwanted mails.

I expected to read about Bayes filtration...
Bayes' nets (filters) makes a revolution in email spam detecting around two decades ago.
But they are useless for messenger spam -- messages usually are too small.
"Hello!" -- is the spam message or it is not?
You start to message and when it is enough information for filter to classify spam / not-spam, you understand it by yourself.

But the true challenge for spam is the human-bots-spam, men are spammers, globalization makes human-spam cheap.
You can buy 1000 anticaptcha made buy indians for $3.

You're absolutely correct. Bayes filters made a great impact on the development of spam detection. I didn't drill down into technical details here and didn't tell about Bayesian filters as want to write a separate post devoted to them (especially to Kalman and Particle filter). Thanks for mentioning filtering of human-bots-spam. It's indeed a big challenge as an ad in messengers sometimes is utterly annoying. I think in this case messengers also use the probabilistic approach, e.g. if this contact is not from your contact list then probability that it's a spam is much higher.

Very informational post @phenom. Keep helping others and keep spreading happiness @knozaki2015

thanks funnyman. this article was very helpful with me, as I struggle with different problems with emails...

thanks @funnyman. Stay tuned to learn more about IT/technological areas

was a good article. I learned a bit. But i was wondering why are promoting an author who has some of the best results on steemit. You did introduce me to good author and thanks.

Hello solarguy.
agree, Phenom is a good author. but still he wanted more followers. I am sure he will now or soon be able to complete walk on his own. I am happy I could help him boost a bit.
usually I do max 1-3 features of an author, so i can feature many others !

If you want to get featured, just contact me! will be happy to work with you !

Very helpful information. Just gotta have help with keeping the spam out!

thanks, team101. Follow me to learn more about IT/Technological/Math areas

Interesting, but I would'nt address link with bitly at spam. It depends, sometimes they're good content :)

hey @phenom. do you offer free tutorials about programming? hehe

I don't make up programming tutorials as there are enough really great on the internet. Check my post devoted to getting started with programming where I listed all courses that I personally like and many of them have passed. Feel free to ask me at steemit.chat if you'll face any problems

Coin Marketplace

STEEM 0.16
TRX 0.13
JST 0.027
BTC 60728.87
ETH 2661.87
USDT 1.00
SBD 2.50