2011-03-05 16:45:30 by chort
Recently I decided to write an application for Twitter to report changes in my friends and followers. As part of the process I went looking for a pre-built library of methods that I could use to interact with the Twitter API. I settled on python-twitter as an actively-developed solution that should keep up with changes to the API.
Due to Twitter's rocky past with SSL/TLS (henceforth simply SSL) support on their web interface, I decided it would be prudent to investigate whether their API used SSL. It turns out that it does, and it has a properly signed certificate. Then I looked at twitter-python to see if it had and option to connect over SSL, and was pleased to notice that it does by default. On a hunch I checked out the underlying library that python-twitter is using to make HTTP requests, and I was shocked at what I found.
In the source for python-twitter (twitter.py), we can see that it sets all the URIs to include the https prefix by default. Below is the trace of the code from twitter.py down to to the base Python SSL implementation. Note that I'm using Python version 2.6.6, which is the latest currently available through OpenBSD's ports system.
############## # twitter.py # ############## ... REQUEST_TOKEN_URL = 'https://api.twitter.com/oauth/request_token' ACCESS_TOKEN_URL = 'https://api.twitter.com/oauth/access_token' AUTHORIZATION_URL = 'https://api.twitter.com/oauth/authorize' SIGNIN_URL = 'https://api.twitter.com/oauth/authenticate' … base_url: The base URL to use to contact the Twitter API. Defaults to https://twitter.com. [Optional] … if base_url is None: self.base_url = 'https://api.twitter.com/1' else: self.base_url = base_url … https_handler = self._urllib.HTTPSHandler(debuglevel=_debug)
So python-twitter is implementing sane defaults. The https_handler is setup via urllib2 and only passes a single parameter, that has nothing to do with security. In order to see if this SSL session is setup safely, we need to take a look at urllib2.py (included with base Python).
############## # urllib2.py # ############## ... if hasattr(httplib, 'HTTPS'): class HTTPSHandler(AbstractHTTPHandler): def https_open(self, req): return self.do_open(httplib.HTTPSConnection, req) https_request = AbstractHTTPHandler.do_request_
So that calls the httplib HTTPSConnection method, without specifying any security parameters. We need to look at what the defaults are in httplib to figure out whether this is safe.
############## # httplib.py # ############## ... try: import ssl except ImportError: pass else: class HTTPSConnection(HTTPConnection): "This class allows communication via SSL." default_port = HTTPS_PORT def __init__(self, host, port=None, key_file=None, cert_file=None, strict=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT): HTTPConnection.__init__(self, host, port, strict, timeout) self.key_file = key_file self.cert_file = cert_file def connect(self): "Connect to a host on a given (SSL) port." sock = socket.create_connection((self.host, self.port), self.timeout) if self._tunnel_host: self.sock = sock self._tunnel() self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file) __all__.append("HTTPSConnection")
Here it's relying on code from ssl.py to open a secure socket, but the only arguments it's allowed to pass to the ssl.wrap_socket method are the socket host/port/timeout, the client key, and the client cert. This means it allows client implementations to authenticate themselves to a server via a certificate (rarely implemented in reality, but nice to have). What is notably absent is any way to toggle whether or not the server's certificate will be authenticated. There also is no way to specify what cipher strength we will accept. This is not looking good. What are the chances Python's SSL library is implemented safely?
########## # ssl.py # ########## … def wrap_socket(sock, keyfile=None, certfile=None, server_side=False, cert_reqs=CERT_NONE, ssl_version=PROTOCOL_SSLv23, ca_certs=None, do_handshake_on_connect=True, suppress_ragged_eofs=True): return SSLSocket(sock, keyfile=keyfile, certfile=certfile, server_side=server_side, cert_reqs=cert_reqs, ssl_version=ssl_version, ca_certs=ca_certs, do_handshake_on_connect=do_handshake_on_connect, suppress_ragged_eofs=suppress_ragged_eofs)
Well shit. By default, Python's SSL support does not verify the server's certificate! It also accepts SSLv2 by default. The problems with SSLv2 are fundamental, as briefly outlined here. This has been known since 1996, and software that allows SSLv2 apparently fails PCI DSS standards, according to many Google search results. Here is a blog post by Adam Young discussing how to eradicate SSLv2 from your systems for an audit.
There are two issues here:
1.) Python's built-in SSL module allows an unsafe protocol version & unsafe ciphers by default, and
2.) Python's built-in HTTP library has no way to override those unsafe defaults (the SSL module does, but the HTTP library doesn't allow you to pass those parameters to SSL). This is complete crap, and if you don't believe me I'll explain why.
When you have a secret that you want to share with someone else, the purpose of keeping the information secret is so that not just anyone can know it. In this case you only tell your secret to a person in a way that you can verify who are telling it to. In most cases this is by talking to them face-to-face, or maybe it's by calling a phone number that you have called them at before, and then secondarily confirming the voice matches what you expect, and maybe even asking for their name. All of these steps can be considered "authentication", i.e. you are talking to the authentic party you wish to share the secret with.
Some times there's a chance that someone else might be able to listen to your conversation, or monitor it in another way. In that case you want to scramble the communication in a way that only you and the authentic party (let's call her Alice) can unscramble.
There are a couple of ways you could do this. One way would be to use a cipher (a code for scrambling) that only you and Alice know. There are some problems with that approach, probably most important is that it's difficult to share different ciphers with everyone whom you wish to communicate with--it doesn't scale.
The second way would be to use a well-known cipher that everyone knows how to descramble, if they have the right key. This is basically a lock that everyone knows how to operate, but which requires a special key that only you and Alice have (simplifying somewhat). In broad strokes, this is what SSL implements.
It should be apparent that it is critically important that your secret is protected by the right lock, i.e. the one that Alice's key fits. If you accidentally protected your secret with a lock that someone else's key could open, that would be really bad! You need a way to tell that you have Alice's lock before you use it. With SSL it's called a certificate. Both you an Alice can use a certificate to prove that the locks you sent to each other are authentic.
In practice on the Internet, usually only the party sending the data (like a credit card) verify's the recipient's certificate (the website you're sending payment information to). This is because the website doesn't really care who is sending credit card information to it, but you really care that your credit card is going to the right place.
Imagine for a moment that you receive a package in the mail. The package has Alice's name on it, and a return address sticker going to a P.O. box. Inside the package there is a note that says "please send me a copy of your plans for defeating illegal government surveillance of citizens; you can trust the plans are safe, because I sent this lock and box for you to use."
It would be really bad if your plans fell into someone else's hands. They might report you to the government, or try to blackmail you, or replace your plans with different instructions and send them to Alice as though they came from you.
Would you put the plans in that box and use the lock you got? Of course not! You'd contact Alice over the phone, or in person, or via some other method that you know will reach her in order to make sure the lock and address are authentic. You'd be insane to blindly trust that using an unknown lock and sending to an unverified P.O. box is a safe thing to do with information that's so sensitive that you're going to the trouble of keeping it under lock & key.
The above scenario is exactly what happens when you talk to a website over SSL, without verifying it's certificate: You are locking your secrets with an unknown lock, sending them to an unknown destination. This is really, really dangerous. If you're going to the trouble of encrypting the information you're sending, obviously it's important that only the authentic party decrypts the information. That's impossible to guarantee without also authenticating the certificate they send you.
So if this is so important, surely developers and engineers would take their responsibility to protect your data seriously and they would make it really difficult to accidentally send your data to the wrong place, right? Sadly, that's not true. Developers are humans, and as such often lazy, doing the absolute least amount of work possible. Implementing the encryption in SSL is fairly easy, because there are widely available libraries and reference examples that show how to construct the locks.
Authenticating, or "verifying" the certificates is more difficult. That requires keeping a list of trusted Certifying Authorities that are supposed to confirm the identities of people, corporations, and websites. This also means that every time you want to setup a new website, or new e-mail address that is going to receive encrypted information, you need to pay money and/or go through an inconvenient process to confirm the identity of that site or address. Many engineers simply generate the key, without bothering to do so in a verifiable way.
What they don't realize is that there is absolutely no value in doing this, because without verifying certificates, there's no way to be sure only the intended recipient can decrypt the data. In fact, there are many tools available to intercept and decrypt the data if the destination hasn't been authenticated. Examples of such tools are: SSLSTRIP, Ettercap, and The Middler. Many more can be found at the bottom of the Wikipedia entry on Man-in-the-Middle attacks.
Some people (as can be seen in these threads) argue that just locking the information is better than nothing, because at least some people won't bother to try to open the lock (even though they could if they tried). This is absolutely a false trail. You don't care about protecting your information from people who wouldn't bother to make a real effort to steal it. You care about protecting your secrets from people who are motivated to steal them. In that case locking your secrets with an untrustworthy lock is no good at all. In fact, it can lead to a false sense of security. You might think that just because it has a lock on it, the data won't be stolen. This might lead you to send information you otherwise wouldn't have, if you had realized that anyone who really wanted to see the data could.
I hope we can put this issue to rest now. Hold your developers to high standards, make them show that your data is safe by implementing both encryption and authentication. If they won't, you know you can't trust them. If they don't think your data is worth keeping secure, they probably don't care much for the reliability of their software either. Use software that the developers take pride in, not some half-assed crap they slapped together in a hurry and can't be bothered to maintain properly.
Finally, this proves once again that you can't trust anyone when they say their software is secure. If I wouldn't have looked at the source code, I probably would never have realized that Python wasn't checking the server's certificate. You should be conducting thorough audits of any software you intend to rely on for important functions.