 |
|
Address Verification Technologies |
|
To check whether
an e-mail address exists or not, it is necessary to perform the same two phases
as a mail server does to deliver a message to a recipient (see the previous
section). First, we need to find out address of the server that receives
messages for the recipient. Then, we have to connect to the mail server and ask
it if it can receive a message for the user with that particular address.
Unfortunately, this method allows detecting no more than about 2/3 of invalid addresses. The
problem is that some mail servers receive all messages for their mail domains,
but if a mailbox doesn't exist, a server notifies the sender via e-mail that
the message is undeliverable.
Current statistics show that about 30% of detectable 2/3 of dead addresses can be
detected in the first phase, and 70% can be detected in the second phase. On
the average, the second stage takes 10 times longer and involves 5 times
greater network traffic compared to the first phase. In fact, the two-stage
checking requires as much time and traffic as sending of a small message to the
address being checked.
Consider the both phases in more details. In the first phase, checking software analyses
e-mail address syntax, identifies mail domain and inquires DNS server about
mail server address for that domain. For interaction with DNS server UDP
protocol is used, this protocol is faster than TCP, because it is not oriented
to establishing connection between servers. Normally, DNS server inquiring time
doesn't exceed 1..2 seconds. During that time, one packet with the query is
sent (about 60 bytes including the packet heading) and one packet with the
answer is received (it's size doesn't exceed 512 bytes; normally it's no more
than 200..300 bytes). Obviously, in this phase all addresses with wrong syntax
and address with non-existent domains are screened.
In the second phase connection is established with a mail server using the SMTP
protocol (based on TCP). TCP is oriented to establishing connection, therefore
the servers involved in the process first send service packets to establish
connection. Once the connection is established, the servers exchange greetings
(see the first three lines in the log below); then, the sender's address is
submitted, and the receiving server confirms its readiness to receive a message
from that address; after that, message recipient's address is submitted:
< 220 ns.watson.ibm.com ESMTP Sendmail AIX4.3/8.9.3/8.9.0; Thu, 22 Aug 2002 20:44:07 +0500
> HELO cisco.my.net
< 250 ns.watson.ibm.com Hello cisco.my.net [12.44.72.94], pleased to meet you
> MAIL FROM:<verify@testmail.com>
< 250 <verify@testmail.com>... Sender is valid.
> RCPT TO:<noshuchaddress@ibm.com>
< 550 <noshuchaddress@ibm.com>... User unknown
> RSET
< 250 Resetting the state.
> QUIT
In this instance, the receiving server answered that user with the address noshuchaddress@ibm.com was unknown to
it and refused to receive the message. After that, the serves exchanged
commands to terminate the connection.
While
checking the address, the servers sent to each other 10 messages with total
size about 500 bytes; but to send all those messages, they had to exchange over
20 packets, so the total traffic was about 2K. Of note, most of the action time
was spent on waiting for reply from the other server.
We are
pleased to offer you two software products designed to check e-mail addresses
for existence Advanced Maillist Verify
(AMV), which does the two-phase checking, and High Speed Verifier (HSV), which
only performs the first phase.
AMV is
helpful when you need to thoroughly check relatively small mailing lists
(containing no more than 50..100 thousand addresses). Advanced Maillist Verify
is also capable of checking addresses in databases, address books of popular
applications, it has COM/ActiveX interfaces for integration into various
software systems and CGI/ISAPI modules for simpler integration into
wed-servers. However, technical principles underlying AMV interface solutions
don't allow using it for longer lists.
High Speed
Verifier is offered as a solution for quick removal of garbage from lists with
millions of addresses. For purely technical reasons, its operating rate is
10..15 times greater than that of AMV with relatively small lists, and with
lists containing millions of addresses the difference in operating rate might
be up to thousands of times. Growth of the HSV operating rate with longer lists
is ensured by the fact that HSV stores the results of all queries to DNS
servers in RAM cache, so, with longer lists the rate of cache hits is greater.
|