Software Development, It's Time to Grow Up

Most people have now heard of the Equifax data breach that "potentially impacts approximately 143 million U.S. consumers". This clearly is a watershed moment for computer security and software development and how we handle it both as an industry and even as a country. I believe three major themes will emerge from this debacle:

  1. How should liability be assigned for this? And to who? (Which will include identifying new parties to hold liable.)
  2. How can we avoid the centralization of so much sensitive data and yet support a service such as credit reporting?
  3. How should the software development industry change development practices and policies in order to help prevent such an incident?

The first two items are in-depth topics worthy of considerable discussion on their own. However, in this post, I'm going to focus on the third item. What do we as a software development community need to do about this?

Houston, We Have a Problem

Based on the history of credit agencies and of data breaches, it is likely that we are going to find out that there was a lot of obvious things wrong with Equifax's technology and processes. Of course, it's easy to "Monday morning quarterback." However, in our society, we do expect certain critical tasks to be done correctly most of the time and have instituted various practices to achieve that:

  • Lawyers must have a bachelors, complete law school, pass a state bar exam and be licensed.
  • Doctors must have a doctorate level degree and they must be licensed.
  • Engineers must have a bachelors degree from an accredited program and they must be licensed.
  • Software developers must - be able to code?

See the problem yet? We have software developers building systems that are critical to society and in many cases we have no externally imposed education, certification or expertise requirements of them.

From the libertarian and laissez faire perspective, this situation is desirable. Why impede such a dynamic and innovative area of our economy with unnecessary bureaucracy and barriers to entry?

On the other hand, history shows that what we have in the legal, medical and engineering fields is what's coming for software development. It's time the industry grew up, faced the music and solved this problem from within before a solution nobody likes gets imposed on us.

Whoa, Back it Up

Of course, the situation is not really that simple. For instance, let's compare the age of the fields mentioned above in comparison to software development. We are talking in some cases 100s if not 1000s of years in contrast to a tiny 70 years or so for software development. As a profession, a field, and a society, we haven't had time to grapple with these issues yet, both to recognize the problems and to implement reasonable controls in response. The software development field is still a babe in comparison.

Also, there are indeed some areas of software development that are already strongly regulated – military, aviation and medical applications being excellent examples.

But, by and large, many critical areas of our society involve software development that is completely unconstrained in how it is executed. And executing it well in a way that prevents massive failures like what happened to Equifax is extremely hard to do. It requires (at the least) appropriate training and many years of experience.

It's The Culture, Stupid

While there are some positive cultural influences within the software development field regarding this problem, most all of them are ad hoc and voluntary approaches. Meanwhile, we have certifications that hardly anybody requires and far too many programming languages to choose from which impedes the development of expertise. We have a code of ethics that probably not one single reader of this article has ever heard of and a hacker mythology created by Hollywood that glorifies unprofessionalism.

But worst of all, we have a negative societal attitude towards software development and the people that practice it. If you asked a random person about software development, two things they would likely tell you about it is:

  1. Their cousin's ability to "create a website" makes them a developer
  2. Software developers are nerds

So, imagine this: you have a valedictorian getting ready to go to college and they are choosing between becoming a lawyer, doctor, engineer or software developer. What are the odds they are going to choose to be amongst "nerds" who are "creating websites" versus those other fields?

Put another way, what kids dream of growing up to be a software developer?

Our cultural attitudes are inevitably lowering the average level of talent in the field. Yet, doing good, reliable, and safe work in this field is at least as hard as any of those other fields.

How Do We Fix This?

I personally took a first step towards fixing this problem 4 years ago when I decided to pivot Data Bakery into a custom software development services company. I already knew about the sorry state of my field (as described above) and wanted to do something much better. So I modeled how I operate based on the successful practices of other fields that are much more mature.

I've got a lot of experience and a lot of knowledge. While Malcolm Gladwell's "10,000 hours of practice to master a field" hypothesis is not universally accepted and I don't believe that appropriate training can only come from universities,  I do believe that 10,000 hours of relevant experience is an extremely important factor. I also believe a master/apprentice approach in software development would be highly beneficial – if not quite practical in our modern economy.

I also eschew the flavor-of-the-week technology chase and make careful, long-term tech choices and have become very good at predictably delivering good results with them. I will eventually get whatever certifications emerge as relevant to the field and require them of anybody who wants to work with me.

I want to change how our professionals, field and society deals with software development. If what you read here means as much to you as it does to mean, please contact me. Let's talk.

 

We must not enable encryption backdoors in consumer products

As the Apple encryption controversy rages on, here's my take on it.

There should never be backdoors put in consumer products to enable the manufacturer or the government to bypass encryption protections. (It is however, reasonable for the government or an enterprise to enable a feature like this for their own devices that they manage and provide for their own internal uses.)

The obvious question is then Why? Doesn't this enable bad people to do bad things?

Look at it this way. You understand how your home's door lock works, right? You have a key that opens it. You also probably understand how a locksmith with the proper knowledge can bypass your particular lock.

Now, imagine that your home's lock has a "master key" feature created by the manufacturer. If you possess this master key, you can unlock the door to anybody's home that uses that brand of lock. These master keys are carefully controlled and distributed to licensed locksmiths only. The law says that only licensed locksmiths may use master keys and only under court order.

Seems pretty safe and reasonable, right? Only the good guys have the master keys and they use them for the right purposes.

Now, imagine that just *one person* out of all 7 billion on the planet is able to fool the manufacturer of the lock into believing that they are one of these trusted locksmiths. They end up with a master key in their hands. Doesn't seem too bad yet, right? It's only one person and laws prohibit them from using it.

Next, imagine this person scans the key and produces a 3D printer model of it. Then they put that model out on the Internet for anyone to download. Now, any criminal, anywhere with access to the Internet and a 3D printer can obtain their own master key for that brand of lock with a very small amount of work.

Do you think this is a far-fetched and unlikely scenario? Think again.

The exact same reasoning applies with the situation Apple is discussing. The only thing preventing all backdoor enabled devices from being decryptable is a little bit of extremely sensitive information stored in the hands of a few good (and imperfect) people. Once that information gets out – and with the ever-increasing number of serious cyber security breaches (see Edward Snowden and Office of Personnel Management), it is very likely that it eventually will – every single device in use becomes instantly decryptable.

That is just too dangerous of a situation to allow. It will instantly endanger people everywhere in sometimes life-threatening ways. And I've got good company in taking that position. The present and two former NSA Directors agree:

The US is “better served by stronger encryption, rather than baking in weaker encryption.” 

Yes, we will be protecting criminals and their behavior at times. But so do the 4th and 5th Amendments to the Constitution.

The danger of the alternative is too high. As H.L. Menken put it:

The trouble with fighting for human freedom is that one spends most of one's time defending scoundrels. For it is against scoundrels that oppressive laws are first aimed, and oppression must be stopped at the beginning if it is to be stopped at all.

 

Demographics and the 2010 U.S. Census

One of the pleasures of developing custom software through my company Data Bakery is the regular opportunity to learn about new and interesting topics. In the case of our client StreetCred, that topic turned out to be demographics and how they were handled in the 2010 U.S. Census. Demographics are an important topic for StreetCred because their software helps law enforcement organizations (LEOs) understand how their officers are interacting with the communities they serve.

The way race and ethnicity are handled when dealing with demographics tends to be fuzzy and happenstance. However, in the 2010 Census, the U.S. Census Bureau decided to be much more explicit: they treated Hispanic as an ethnicity separate from race (White, Black, American Indian and other races). This decision has resulted in enough confusion that the U.S. Census Bureau issued a 23 page document titled Overview of Race and Hispanic Origin: 2010 in March of 2011 that attempts to explain the situation. Briefly summarized: Hispanic is considered an ethnicity and not a race. In other words, Hispanics are in the majority considered racially White.

As a result, the question of whether or not a respondent was of "Hispanic origin" was asked independently and in addition to the respondents race. This may seem like just an interesting definition of little consequence, but the way this was handled has a direct impact on consumers of Census data who are more interested in answering demographic questions where ethnicity and race need to be considered together.

A Maze of APIs and Variables

When you first begin exploring the U.S. Census data, it comes off as an arcane and maze like combination of datasets and APIs. That tends to lead people to the US CitySDK which was created by the U.S. Census Bureau to make it easier to access U.S. Census data. Unfortunately, the CitySDK is designed to run in a web browser which limits how it can be used. However, it does demonstrate how and which U.S. Census APIs to use for various types of queries. In particular, demographic queries in CitySDK use the American Community Survey 5 (ACS5) dataset.

While the ACS and 2010 U.S. Census are two different surveys/programs, the data surveyed and reported in the ACS is handled the same way as was decided for the 2010 U.S. Census – Hispanic is an ethnicity and not a race.

As you dig deeper into the ACS5 variable set and experiment with it, you begin to get a sense of how the data is structured. Consider the following variables:

Variable Name Label
B02001_002E White Alone
B02001_003E Black or African American alone
B02001_004E American Indian and Alaska Native alone
B02001_005E Asian alone
B02001_006E Native Hawaiian and Other Pacific Islander alone
B02001_007E Some other race alone
B02001_008E Two or more races
B01003_001E Total

Sounds pretty good so far, right? If you pull the data for Fort Worth, here's what you get:

Label Count
White Alone 508,894
Black or African American alone 147,471
American Indian and Alaska Native alone 4,621
Asian alone 28,984
Native Hawaiian and Other Pacific Islander Alone 1,104
Som other race alone 65,096
Two or more races 22,403
Total 778,573

Everything seems good and the math adds up. However, what if you want to know what proportion of your population is Hispanic? You can see some of the confusion on this if you look at the Wikipedia entry for Demographics of Fort Worth.

If you look further into the ACS5 variables, you find:

Variable Name Label
B03001_003E Hispanic or Latino

If you add this to the ACS5 query, you get 266,472 for Fort Worth. Nice to know, but what do you do with this number? You can't add it to the counts above because that would be over-counting people. The only way to combine this ethnicity count with the racial counts would be to reduce the counts above to just the people who answered a particular race but also said "no" to Hispanic origin. But it is impossible to determine how to do this from just the numbers above.

The Lesser Traveled Path

Luckily enough, if you dig further into the ACS5 variable set, you find these:

Variable Name Label
B03002_003E Not Hispanic or Latino – White Alone
B03002_004E Not Hispanic or Latino – Black or African American alone
B03002_005E Not Hispanic or Latino – American Indian and Alaska Native alone
B03002_006E Not Hispanic or Latino – Asian alone
B03002_007E Not Hispanic or Latino – Native Hawaiian and Other Pacific Islander alone
B03002_008E Not Hispanic or Latino – Some other race alone
B03002_009E Not Hispanic or Latino – Two or more races

With these variables, you can now query the following data:

Label Count
Hispanic or Latino 266,472
Not Hispanic – White Alone 318,732
Not Hispanic – Black or African American alone 145,330
Not Hispanic – American Indian and Alaska Native alone 2,262
Not Hispanic – Asian alone 28,534
Not Hispanic – Native Hawaiian and Other Pacific Islander Alone 966
Not Hispanic – Some other race alone 1,228
Not Hispanic – Two or more races 15,049
Total 778,573

And there you have it. Demographic data combining ethnicity and race using post-2010 U.S. Census data.

Part 3 - The DAE Paradigm Shift

In Part 1 we discussed the problem of reversible application encryption, and, in Part 2, we presented the details of Distributed Application Encryption (DAE), a proposed approach for application-level encryption designed for our current security environment. In Part 3, we cover the pros, cons, and other implications of DAE.

Pros

  • Encryption Diversity – each user has their own encryption key for their own sensitive data. In order to obtain all data in plaintext, you need to gain access to each and every user’s unencrypted User Master Key. Put another way, cracking the key for one user doesn’t affect the security of another user’s sensitive data.
  • Narrow Vulnerability Windows – the User Master Key is only exposed during user requests. Likewise, the Unlocked Session Cookie only exists while the user is logged in.
  • Geographic Diversity – Unlocked Session Cookies are stored with users’ web sessions, not on the server. An attempt to collect them on demand would require attacking many different users devices simultaneously.
  • Restricted Attack Scenarios – Most attack scenarios involve modifying an application to collect plaintext information. Under DAE, an attack of this type would expose only those user accounts accessed from the point the attack begins until it is finally discovered. Theft of an entire database in plaintext form at any particular point in time would be impossible.
  • Transparency and Accountability – Applications that need to access sensitive data must request that users return to the application, establish a session, and make a request to provide the needed Unlocked Session Cookie. This provides transparency for users as to how their sensitive data is being handled, as well as creating an implied-approval process of that handling for the application provider.

Cons

  • DAE is dependent on a plaintext password sign-in mechanism in order to unlock data.
  • Applications implementing DAE are susceptible to code-modification attacks in which surreptitious collection of User Master Keys is added in the request path.

Regardless of the technical pros and cons, adopting DAE imposes a new paradigm for managing sensitive data – a paradigm that requires fundamentally rethinking how applications handle sensitive data.

Applications Should Only “Borrow" Sensitive Data

The new paradigm can be summed up as follows: a DAE application neither owns nor has free access to sensitive data it collects from its users; it must always ask for and be granted that access. To benefit from DAE, application designers and implementers must embrace this principle and implement it elegantly and thoroughly.

A real-world analog to DAE is a safe-deposit box at your local bank. You store valuables in it, and neither you nor the bank can access its contents without mutual coordination (barring court orders, thievery, etc.). This mutual coordination is guaranteed via an access system requiring two keys – yours and the bank's – to unlock the safe-deposit box.

This parallel also highlights the major drawback: valuables are less accessible for both the user and the bank.

For some applications and some types of sensitive data, converting to this information-management system would be simple. For instance, a doctor’s office rarely needs a patient’s Social Security number. Storing it and requesting access to it via DAE seems reasonable for both doctor and patients.

But imagine a financial application that handles sensitive data like account numbers. Many types of transactions (interest accrual, check clearing, bill pay, direct deposit) would prove cumbersome if each required an access request to the customer. There will likely need to be per-user, per-application, and even per-industry analyses and standards establishment as to what is treated as “sensitive data” in a DAE architecture.

DAE variants are also a possibility. Users could grant temporary server-side storage of their Unlocked Session Cookies in order to handle batch and other types of autonomous processing. Application designers might unlock sensitive data and temporarily store it in plaintext for batch processing runs. There are bound to be many other variations possible. 

It’s a Process, Not an Event

This series of posts was not intended to define and promote DAE itself. DAE is merely a useful vehicle for defining and promoting the paradigm shift it represents. The real measure of DAE's worth will be apparent in how this paradigm shift is regarded and incorporated into application design in general. While this discussion has assumed a web application environment, the techniques described could apply in many scenarios.

Reasonable technical minds can differ on what exact form the paradigm shift should take. But one fact none can deny is that, as each new data breach makes headlines, it becomes ever more painfully obvious that our current approaches aren’t working.

Part 2 – Distributed Application Encryption (DAE)

In Part 1 I talked about the ready reversibility of application encryption. At the root of that problem is how encryption keys are managed. In most applications, encryption keys must be readily available so the application can decrypt and encrypt any of its data at any moment. When this availability is combined with a successful spear phishing attack on an administrator, it is only a matter time before any or all application data is decrypted and exfiltrated.

In direct contrast, DAE defends much more effectively against this threat scenario. Black hats can have complete run of your infrastructure and yet still not be able to decrypt sensitive data. DAE achieves this with a trade-off: ready accessibility vs. encryption durability.

While typical application level encryption stores a handful of encryption keys in config files or HSM modules and uses them in different functional areas of the application and database, DAE manages one encryption key per user. This means that, rather than organizing encryption by functional or database area (for instance, by protecting all Social Security Numbers in the app or targeting columns of a particular table), encryption is organized and employed on a per-user basis.

When a user is added to the system, a random encryption key (the User Master Key) is generated for that particular user by the server. This User Master Key is used to reversibly encrypt and decrypt any sensitive data related to that user. When the User Master Key is at rest, it is encrypted with the User Session Key. The User Session Key is derived from the user's plaintext password when it is presented during account creation or later during sign-in.

Operationally, the User Master Key is “unlocked” whenever a user signs in. During sign-in, the user presents their plain text password to the server, the server generates the User Session Key and decrypts the User Master Key. At this point the User Master Key is now encrypted with an Application Master Key (which itself is stored in a config file) and stored as an Unlocked Session Cookie in the user’s web session. The User Master Key is now “unlocked” while the plain text password and the User Session Key are discarded.

Each time a user makes a request to the server, the user’s session provides the Unlocked Session Cookie with the request. If access to that user’s encrypted data is needed, the server can decrypt the Unlocked Session Cookie with the Application Master Key to get the User Master Key. The User Master Key can then be used to encrypt or decrypt the relevant user data. When the request is completed, the User Master Key is discarded. When the user’s sign-in session ends, the Unlocked Session Cookie is removed from the session.

The following diagram goes into these scenarios in greater detail.

Distributed Application Encryption Use Case Scenarios

Distributed Application Encryption Use Case Scenarios

In Part 3 we’ll discuss the pros, cons and other implications of DAE.

Part 1 – The Security Elephant in the Room

If you keep up with technology, you know major cyber security breaches have become commonplace. From Target and Home Depot to the US Office of Personnel Management, these break-ins are growing larger in scope and happening more frequently. While everyone acknowledges data theft is troubling, even security professionals just shrug it off as business as usual.

However, I believe a subtle but significant shift has occurred in the security landscape. If you build any sort of web/Internet system, the chances that your product's entire database will be stolen sometime during its lifetime are now close to 100%. Once you acknowledge this reality, new, worrying questions arise about how to protect application data – and what kind of liability companies may face if they continue to build applications without taking this reality into account.

The Placebo Effect

If you talk with security experts, they will tell you that the answer to our computer security problems is application-level encryption. While application encryption approaches are appealing, they have one serious flaw that these experts tend to hand-wave away: ready reversibility. Granted, application-level encryption provides general improvements in diversity, opacity and at-rest protection. However, with the right administrative access, it is usually trivial to reverse these protections. This makes the whole approach vulnerable to one of the most challenging security threats today: spear phishing.

Let’s say you develop a web application employing multiple encryption keys to protect data with multiple algorithms at multiple different points in your database. Sounds robust, right? At least until I spear phish one of your dev ops guys and use his administrative access to exfiltrate the source code to the application, its database and the keys that protect it all.

Oh, but you are using a Hardware Security Module (HSM) and are protected against this scenario? Well, all I have to do is be patient and use your dev ops guy’s access to all of the involved systems to run the data through the HSMs and decrypt it before exfiltration. Or even worse, I discover and use maintenance functionality (e.g. an “archive” function) already built into the application to get it to export all of its data unencrypted.

Houston, We Have a Problem

Once someone has administrative access to your systems, most encryption techniques are easily reversible. Thus, since it is likely impossible to keep black hats from penetrating your infrastructure and gaining administrative access, security that relies primarily on application-level encryption is fundamentally flawed.

What’s the answer? In Part 2, I’ll propose a new approach to application encryption which I call Distributed Application Encryption (DAE). In Part 3, I’ll discuss the strengths, weaknesses and implications of DAE.

The Password Manifesto

I'm not an active member of the computer security community, but I have considerable knowledge of and experience with the topic. I've been thinking about and working on the password problem for many years now and have come to some conclusions that I want to share more widely.

(This document is still in draft form. Please suggest enhancements in the comments, on Twitter or contact me with your thoughts or if you are interested in becoming a signatory when complete.)

#1 – Attempting to eliminate passwords as an authentication method will never succeed due to market inertia. The industry should instead focus on evolving password authentication.

Password authentication is the most common authentication technique and is understood by a large number of users and by a large number of developers. There are very large amounts code that have been written that implement password authentication. Countless numbers of systems rely on it today.

#2 – It is time to acknowledge that knowledge factors are no longer viable and should be removed from online authentication systems.

Authentication is normally accomplished by various combinations of factors: something you know (knowledge factor), something you possess (possession factor) and something you are (inherence factor). Knowledge factors are problematic for users who struggle to manage them. This situation has been compounded as password complexity requirements have increased and hacker techniques have evolved.

#3 – All passwords should be completely random, consist of 256 bits of entropy derived from  cryptographic random number generators and be assigned by service providers instead of chosen by users. This transforms the password from a knowledge factor into a possession factor (i.e. a key).

Computing power and cracking techniques have made effective passwords almost impossible for humans to represent as knowledge. Therefore, it is time to eliminate knowledge factors as a class and focus on possession and inherence factors.

#4 – The W3C should define a "Keychain" API that allows website Javascript to store keys in a user's local keychain during account setup and retrieve keys during authentication.

Web browsers and third party password managers attempt to detect login forms in order to automate password authentication. This is difficult to implement and error-prone.

#5 – All keychain implementations must encrypt and decrypt key content locally. Vendors must not be able to access key content by design. Any key content that leaves the context of a device must be encrypted such that only the user can decrypt it via local authentication methods.

Vendors and service providers should assume that their infrastructure will be penetrated by hackers at least once in the lifetime of any product. Therefore, the only way to protect user content is to make it inaccessible to anyone but the user on their own device using local authentication and decryption methods. This is somewhat obvious today and many password managers are designed this way – but it should still be said.

#6 – All operating system and browser vendors should standardize on keychain formats and define standard APIs and formats for interchanging keys and other related content. Vendors should also synchronize keys between users' keychains on users' devices residing within each vendor's ecosystem.

Password managers are not the answer; password management is. Usernames, passwords and authentication are features of applications and operating systems – they are not a standalone product. They are best implemented and delivered as infrastructure. Vendors should recognize this and take responsibility for making it happen.

 

The Password is Dead, Long Live the Password!

I spent some time over the past couple of years working on technology to replace passwords along with a lot of other folks (LaunchKey, Clef, Nexiden, Google AuthenticatorSQRL, Mozilla Persona, oneID, etc. – the full list is sadly much longer than this). These ideas each have their technical merits and some get quite a bit of fanfare on their introduction to the market. However, I've personally reached the conclusion that none of these approaches are ever going to succeed in replacing the password with some nifty new authentication mechanism. It's not that there's no desire for a good solution nor that one can't be built. It's a much more fundamental problem: market inertia.

Rome Wasn't Built in a Day

The first thing to recognize is that the username and password problem isn't and will never be a product oriented problem – its nature is that it is an inherent feature of other products: applications, frameworks and operating systems. It's like alternative file browser software for your desktop OS. Sure, you can build it, but it's not going to be easy to accomplish and the operating system vendors can quickly react to whatever your value proposition is. Compounding the situation is the wide adoption and usage of passwords, the amount of books and training material about them and the large volume of training and experience so many developers and users have with them.

The moment you start looking at the problem this way you begin to realize the enormity of the task a new product in this space faces. How do you build a product that can create new standards, prompt people to write new books about the techniques and convert developers to using a new product when it is unlikely to ever successfully be adopted by enough applications, frameworks and operating systems to justify it?

It's a bit like throwing a pebble into a large pond and expecting anything more than tiny ripples to come from it before it quickly disappears below the surface. 

You might say "but wait – isn't this really just the chicken and egg problem?". I.e. if you could only convince a critical mass of users and providers to adopt a new system, the product would surely succeed, right? My contention is that no, this is a different and additional problem. You could have the niftiest product in the world, get a significant amount of users and sites to adopt it and would still face this market inertia problem.

So if the problem is nearly intractable from a product development perspective, what is the answer? Is this as good as it gets?

The Future is Now

I believe we need to stop trying to get rid of the password and instead work with the market inertia to get the password to evolve into a better direction. 

So, does that mean password managers are the answer? I contend that password managers are not the answer but password management is. In other words, password managers as a product market are a dead-end for the same reasons explained above. However, password management as a feature of applications, frameworks and operating systems is the future. Think LastPass versus iCloud Keychain. When your competitor controls the playing field you are competing on you've already lost – you just don't know it yet.

So, what will password management of the future look like? Here's how I think the password will evolve:

  • Passwords of the future will no longer be memorized. They will be very large, completely random and securely generated. A standard will be published on their format and exactly how to properly create them.
  • The user will no longer choose their own password – the service provider will assign it upon account creation.
  • A new web browser Javascript API will be introduced supporting key chain functions. This will allow a website to store an assigned password in your browser's keychain and allow that same website to ask for your password. Access and control will use origin domain protections and user prompts.
  • The web browser will connect to a platform key chain service that will synchronize and keep your passwords available on all of your devices.

You might have had two thoughts while reading that. 1) That sounds a lot like API keys that developers use to access web services, doesn't it? Yes, that's exactly right.

And 2) Doesn't the browser already support client authentication with client SSL certificates? There are some similarities, but the big difference is that the existing client SSL authentication comes with all of the baggage of a PKI infrastructure. The implementation of it is really dragging those problems down into the browser space – and I believe accounts for its limited adoption. The approach here is bubbling up a client authentication solution from passwords.

The User Experience

So, what would this look like to users? Here's two thought experiments for the most common cases:

Signup

A user visits a website and fills out a form to signup for service. There's no username or password field. They submit the form and once the account is created, Javascript sent back to the browser invokes the key chain API to add a new account (i.e. save a password for the domain). The user is prompted to give a description for the item. The browser saves it to the platform key chain and the platform key chain synchronizes it with the cloud and all of the user's other devices.

Signin

A user visits a website's login page. Javascript on the page detects and invokes the key chain API asking for a password for the domain. The browser presents a window to the user showing the description for each of the stored passwords for the domain. The user selects one which is then returned via the key chain API call to the website's login page Javascript code. The page proceeds to authenticate and sign in the user.

A couple of basic benefits become apparent:

  • A user will never see or be prompted for a password again. Evolving passwords will effectively get rid of them.
  • Every single password created and used this way will be cryptographically unbreakable. The only way to get your password is to steal it from you.
  • The main authentication problem will effectively move from being between service providers and you to being between each of your device's and you. This problem is much easier to solve locally since biometrics become viable. Touch ID anybody?
  • Passwords of this type (large and random) can service to both authorize and identify a user. There's no need to have a separate username to identify an account. The password (or hash of it) can identify an account.

One of my metrics for when an idea is feeling "right" is when it has a real-world analog that is tried and true. In this case, think of locks and keys. You might think login pages are the locks and passwords are the keys. But realistically, the analog for what we have today is really a combination lock. You have to remember, manage and protect the combination. Anybody who can obtain it can open the lock.

In this evolved password scenario, key chains and passwords are created, managed and protected like real keys. A real key isn't likely to be properly duplicated by just looking at it and you aren't likely to guess it. It requires possession of the key (or a duplicate of it) in order to gain access.

In other words, passwords change from something you know to something you possess.

It's Not a Startup Opportunity

If you've read this far, you probably have come to the same conclusion I have – this is not a startup opportunity. This is a problem of standardization and adoption by major application, framework and operating system vendors. Since this is an evolutionary approach instead of a whole new method, it's easy to imagine practical and cost-effective steps that could be taken to get there.

I believe whether or not anybody actively plans this outcome, this is where the problem will ultimately evolve to anyway following the current course. But the description presented here isn't a complete solution as authentication is a large and thorny problem. What about password resets if my key chain loses a key? How do I access websites when I'm not on one of my devices? Feel free to chime in with your thoughts in the comments and add to the discussion.

It's not a problem a startup will come along and magically solve. It's our problem to solve right now.

Durus: The Best Python Object-Oriented Database You've Never Heard Of

I've been developing software in the Python programming language for over 20 years now. It's my preferred language due to it's readability, speed of development and massive number of modules available for it. This blog post is about one of the hidden gems in the Python world: Durus.

If you've built any reasonable sized application before, you've likely worked with a database. The most common database technology available is Structured Query Language (SQL) databases which takes a row and column approach to storing, querying and retrieving data. This technology has been around since the 1970's and has dominated in terms of deployments.

In recent years, new database technologies have been becoming more common and popular. These include key-value, graph, object and document storage systems. Together, this whole trend has been labeled "NoSQL".

Durus actually predated this trend having been developed in 2004 and presented at PyCon 2005. It has it's origins in ZODB which was developed in the late 1990s – Durus took the general architecture of ZODB and simplified it. While it was introduced to the world in the context of web applications, Durus is really more widely applicable than that. It's ACID and can be run standalone or in a client/server architecture for scalability.

If you aren't familiar with object-oriented databases, they are quite a departure from the SQL model. Databases don't consist of tables and rows – they are collections and objects. For instance, in Python, the most common mutable object types are lists, dictionaries and sets. These have direct counterparts in Durus that act and behave like their corresponding type – but are Durus aware.

NoSQL – The Python Way

There's no better way to understand Durus than to see it in action:

$ durus -c --file=test.db
Durus 127.0.0.1:2972
    connection -> the Connection
    root       -> the root instance
>>> root
<PersistentDict 0>
>>> root.items()
[]
>>> root[1] = "a"
>>> root.items()
[(1, 'a')]
>>> connection.commit()
>>> ^D
$ durus -c --file=test.db
Durus 127.0.0.1:2972
    connection -> the Connection
    root       -> the root instance
>>> root.items()
[(1, 'a')]
>>> from durus.persistent_set import PersistentSet
>>> s = PersistentSet()
>>> s.add(1)
>>> root["set"] = s
>>> connection.commit()
>>> root["set"].add(3)
>>> root["set"]
<PersistentSet 32>
>>> list(root["set"])
[1, 3]
>>> connection.abort()
>>> list(root["set"])
[1]
>>> 

Fundamentally, an object-oriented database is based on an object graph. It has a root object that refers to everything else that will be added to the database. The database connection is used to control transaction semantics. Containers, fundamental types and custom classes can all be added to the database as long as the object graph they are a part of is connected to the root object.

There is one "gun meet foot" pitfall to watch for: never put non-Durus aware mutable objects or containers in the database. I.e. you should never add standard Python lists, dictionaries, sets or subclasses of object to the database. The Durus versions are instrumented to properly capture changes in a commit if their contents are modified. Otherwise, you will lose data that you think is being committed to the database.

Why Durus?

Philosophically speaking, object-oriented databases are quite elegant and simple to work with. They use the language runtime environment itself to implement full database functionality without having to learn new semantics (i.e. SQL).

Need a new table? Declare a new Durus persistent class and add a Durus container to the root. Need to do a join? Write a nested for loop across two containers. Need a large-scale indexed container? Use a Durus BTree.

And you get all of this simplicity and elegance in a high performance package. Durus uses an append-only file format on disk (along with a packing utility) and a memory cache which enables large deployments: 10's of GB databases, millions of objects and quick access times.

So, if you got to the end of this post and still haven't installed and played with Durus, what are you waiting for?

Entrepreneurship – Dallas, TX Edition

If you Google "entrepreneurship", you get back this definition:

Entrepreneurship is the willingness to take risks and develop, organize and manage a business venture in a competitive global marketplace that is constantly evolving. Entrepreneurs are pioneers, innovators, leaders and inventors.

While there is a long history of entrepreneurship in certain parts of the world, the trend is picking up worldwide:

We are part of the global entrepreneurial class, an identity that transgresses borders, nationalities, and religion.  Entrepreneurs are a demographic, not a geographic, and their conspicuous creation is driving positive change in our world. Silicon Valley remains a bastion, and a gravitational force.  But the walled gardens are withering, and the access class is becoming an asset class.  Investors and entrepreneurs need a passport to the present.

The Dallas, TX area is no stranger to this phenomenon with organizations and activities like The DEC, Tech Wildcatters, The Garage, VentureSpur Texas, Common Desk, The Grove, BigDOCC, LaunchDFW, Dallas New Tech, DFW Startup Happy Hour, and plenty more. DFW has been no stranger to startup companies in the last 20 years either.

And of course there are local personalities helping driving this trend such as: Bradley Joyce, Mike Sitarzewski, Jennifer Conley, Michael GilbertTrey Bowles, and Gabriella Draney, to name but a few. And my small contribution is mentoring with a couple of the accelerator/incubator programs.

So far, it's a pretty straightforward set of facts repeated in various cities around the world, right? Not so fast, pardner – here in Texas we do everything bigger and better – and entrepreneurship is no different.

Meet Bill McNeely

BmUzG_WCQAA5Ygp.jpg

Bill lives in Dallas and is a veteran of the Afghanistan conflict who, like many other veterans, has struggled since returning to civilian life. Between a moribund economy and the lingering effects of combat, Bill has struggled to support himself and his family. However, he's not just a veteran – he is also an entrepreneur and an active participant in the Dallas entrepreneurial community. This t-shirt he designed sums up his unique perspective:

Bill has been working on startup ideas surrounding the skills he excelled in with the military: logistics. The result is DeliverToMe, a B2B local delivery service. However, how he got to that point is just as important a story about Bill as it is about the Dallas entrepreneurial community.

Bill has received support from The Garage in acquiring a vehicle for his service and building his business model, acquired his first client Brevida, also a startup, through The DEC and received training and mentoring through the Google-sponsored Startup Weekend NEXT program led by Kevin Strawbridge where I helped Bill with his pitch and refined his business model. He also received in-kind support from FISH Technologies, a local award-winning experiential marketing technology company.

The details of how DeliverToMe has developed are much less important than how so many different elements of the Dallas entrepreneurial community spontaneously came together to help Bill. There was no central planning; there were no turf battles. The consistent ingredient was entrepreneurs with a passionate desire to help other entrepreneurs succeed. The difference here is that in Texas, we don't just want each of our own efforts to succeed – we want everybody's entrepreneurial efforts to succeed. That's how entrepreneurship benefits society as a whole and makes it all worthwhile.

What can I say? It's Dallas. That's how we roll.