I run an email service. In fact I've run VFEmail.net since 2001 - that's before Gmail. There's been a lot of talk about privacy and re-writing SMTP to provide more encryption and 'mask' metadata. Long articles about email headers and how correlation of To/From/Subject can lead to a 'highly accurate guess' without any actual content. This is definitely a concern, especially because correlation only provides a guess. Let's take a look at what's really happening, from the ESP (Email Service Provider) side to the LEO (Law Enforcement Organization) side to better understand the problem.
So let's start with an email's structure. Email's are designed to be like letters. There is an envelope, and content. Take a closer look at this LaTeX Template for the content part:
As you can see - it has what's referred to as 'metadata'. There is
To: "Prof. Jones".
From: "John Smith"
and various other address info.
Remember though - this is a letter. It's enclosed inside an envelope. This is the content of an email that you would see. The only way to retrieve the content or the metadata contained in this email, is to open the email itself.
The envelope, on the other hand, has similar data. It has a 'To', a 'From', and a 'Date' in the form of a Postmark. This is also metadata.
So what is the difference? It's all the same you say?
IMHO, LEGALLY, there is a world of difference.
In addition, you can leave out the From address on the envelope - so keep that in mind. Also keep in mind - LEO are building a case.
An ESP isn't required by law to provide Inbox content because the target looked at a cop funny. To get that, an investigator must investigate. There must be evidence first, and a judge must sign off on that warrant. That's the point of checks and balances in our system (Of course, in regards to NSA vacuuming up data - we're no longer talking about legal activity).
Here's the confusion - As an email provider, I receive legal requests for metadata. That metadata is gathered from the SMTP
logs - not the email contents. Those logs, in my opinion, likely also exist at the Post Office. I find it highly unlikely that all the automation in the USPS (including automated scanning of hand-written envelopes), doesn't get logged somewhere. In any case, if the investigator wanted INBOX CONTENT, they MUST request INBOX content. Therefore, if the warrant does not include mailbox contents, I can only provide log data.
This is a VFEmail log:
2015-03-24
07:58:43.155818500 CHKUSER accepted rcpt: from
[win-lotto dabspalsy.com::] remote [kit.dabspalsy.com:unknown:antispamip] rcpt [rick@havokmon.com] : found existing recipient
That log data comes from two places - the knowledge of the connection (Remote IP address, local time) and two commands - MAIL FROM and RCPT TO. Those are pretty obvious. Technically there is also HELO, which is a text identifier of the remote server.
As you can see - the recipient is there, me, and the sender is win-lotto@dabspalsy.com. Obviously Spam. That's the envelope data.
Another point of confusion - Unlike USPS mail, Emails are processed in their entirety as they're passed through the system. That is, all the 'header' info (on the left, everything above 'deekayen,') is visible to the SMTP server during transmission. This is true everywhere, no matter what Lavabit claims.
What does that mean? It means that we can help stop Spam by checking the IPs of all the previous servers in the header to look for points of abuse. It means we can find delivery delays by checking the timestamps of each server in the header. It does also mean that sometimes the Subject may be added to a log file - or any of the other header info. BUT, doing so is ENTIRELY up to the ESP.
Technically, during this processing, an ESP could easily write out an entire copy of this email for nefarious or legal purposes. This is also true everywhere, no matter what Lavabit claims - but in this post we're concentrating on the Metadata scare.
Any provider, unless requested to examine full email contents, should only be providing the ENVELOPE data. That's it.
"I just read half this post, I don't want my emails correlated and a profile built of me, and you haven't told me anything yet!", you say. Not true, we've whittled down perception to reality. Unless the government is requesting a wiretap, they'll simply get MAIL FROM, RCPT TO, and REMOTE IP (at least from an ESP who is privacy conscious).
So let's look at the envelope again. Just like USPS mail, the MAIL FROM can be forged. But, if you forge the From on an envelope - returned mail won't be able to reach you. At least not directly..
VFEmail provides what we call the 'Metadata Mitigator'. It re-writes the MAIL FROM so both the local logs and the recipient servers logs show a unique MAIL FROM address. This address is parsed at VFEmail (if the email is returned), so a bounce doesn't get lost. Most importantly though, the log data on the recipients server only shows that an email was delivered to 'Prof Jones' from 'Random account at VFEmail'. This method has existed for decades and is known as VERP. Though historically it's been used for managing mailing list member bounces, not for privacy.
Wait - what? Yes, simply mask the MAIL FROM - and there will be nothing to correlate. The recipient's log metadata, even if they only receive email from YOU - will simply show a single email from 'anyone and everyone' - no duplicates. The recipient would need to be specifically monitored via a wiretap order to see that you had sent them more than one email.
So why would we want to re-write the entire Email ecosystem?
We don't. It's a pointless exercise. It would remove many features we take for granted (Spam blocking, forwarding, queuing), and make things more difficult for users (No debugging info available ANYWHERE).
"But a rewrite fixes other issues too!" - so do the hundreds of alternate ways to handle email in it's current form:
- By using PGP (see above full email example), the 'user viewable' content of an email, everything under 'Date', is encrypted at rest
and in transit.
- Using TLS will encrypt everything including the MAIL FROM and RCPT TO during transit - to prevent snooping, assuming you're not rewriting your MAIL FROM. Sure, the headers are available at rest, while the email is sitting in your INBOX. But no respectable mail system uses headers for log data. And again, in order to read those headers, the ESP should have a subpoena for full mailbox content.
- Worried about the ESP reading the mail headers in your INBOX? POP and delete your mail.
- Don't trust that a provider follows their policies? They claim they can't read your INBOX, but copy it's contents to the FBI anyways? Run your own server - with SMTP being an open standard, you can choose from any number of packages and personally remove the last link in the metadata debate.
Done. You've just created 'DarkMail'.