The Password Manifesto

I'm not an active member of the computer security community, but I have considerable knowledge of and experience with the topic. I've been thinking about and working on the password problem for many years now and have come to some conclusions that I want to share more widely.

(This document is still in draft form. Please suggest enhancements in the comments, on Twitter or contact me with your thoughts or if you are interested in becoming a signatory when complete.)

#1 – Attempting to eliminate passwords as an authentication method will never succeed due to market inertia. The industry should instead focus on evolving password authentication.

Password authentication is the most common authentication technique and is understood by a large number of users and by a large number of developers. There are very large amounts code that have been written that implement password authentication. Countless numbers of systems rely on it today.

#2 – It is time to acknowledge that knowledge factors are no longer viable and should be removed from online authentication systems.

Authentication is normally accomplished by various combinations of factors: something you know (knowledge factor), something you possess (possession factor) and something you are (inherence factor). Knowledge factors are problematic for users who struggle to manage them. This situation has been compounded as password complexity requirements have increased and hacker techniques have evolved.

#3 – All passwords should be completely random, consist of 256 bits of entropy derived from  cryptographic random number generators and be assigned by service providers instead of chosen by users. This transforms the password from a knowledge factor into a possession factor (i.e. a key).

Computing power and cracking techniques have made effective passwords almost impossible for humans to represent as knowledge. Therefore, it is time to eliminate knowledge factors as a class and focus on possession and inherence factors.

#4 – The W3C should define a "Keychain" API that allows website Javascript to store keys in a user's local keychain during account setup and retrieve keys during authentication.

Web browsers and third party password managers attempt to detect login forms in order to automate password authentication. This is difficult to implement and error-prone.

#5 – All keychain implementations must encrypt and decrypt key content locally. Vendors must not be able to access key content by design. Any key content that leaves the context of a device must be encrypted such that only the user can decrypt it via local authentication methods.

Vendors and service providers should assume that their infrastructure will be penetrated by hackers at least once in the lifetime of any product. Therefore, the only way to protect user content is to make it inaccessible to anyone but the user on their own device using local authentication and decryption methods. This is somewhat obvious today and many password managers are designed this way – but it should still be said.

#6 – All operating system and browser vendors should standardize on keychain formats and define standard APIs and formats for interchanging keys and other related content. Vendors should also synchronize keys between users' keychains on users' devices residing within each vendor's ecosystem.

Password managers are not the answer; password management is. Usernames, passwords and authentication are features of applications and operating systems – they are not a standalone product. They are best implemented and delivered as infrastructure. Vendors should recognize this and take responsibility for making it happen.


The Password is Dead, Long Live the Password!

I spent some time over the past couple of years working on technology to replace passwords along with a lot of other folks (LaunchKey, Clef, Nexiden, Google AuthenticatorSQRL, Mozilla Persona, oneID, etc. – the full list is sadly much longer than this). These ideas each have their technical merits and some get quite a bit of fanfare on their introduction to the market. However, I've personally reached the conclusion that none of these approaches are ever going to succeed in replacing the password with some nifty new authentication mechanism. It's not that there's no desire for a good solution nor that one can't be built. It's a much more fundamental problem: market inertia.

Rome Wasn't Built in a Day

The first thing to recognize is that the username and password problem isn't and will never be a product oriented problem – its nature is that it is an inherent feature of other products: applications, frameworks and operating systems. It's like alternative file browser software for your desktop OS. Sure, you can build it, but it's not going to be easy to accomplish and the operating system vendors can quickly react to whatever your value proposition is. Compounding the situation is the wide adoption and usage of passwords, the amount of books and training material about them and the large volume of training and experience so many developers and users have with them.

The moment you start looking at the problem this way you begin to realize the enormity of the task a new product in this space faces. How do you build a product that can create new standards, prompt people to write new books about the techniques and convert developers to using a new product when it is unlikely to ever successfully be adopted by enough applications, frameworks and operating systems to justify it?

It's a bit like throwing a pebble into a large pond and expecting anything more than tiny ripples to come from it before it quickly disappears below the surface. 

You might say "but wait – isn't this really just the chicken and egg problem?". I.e. if you could only convince a critical mass of users and providers to adopt a new system, the product would surely succeed, right? My contention is that no, this is a different and additional problem. You could have the niftiest product in the world, get a significant amount of users and sites to adopt it and would still face this market inertia problem.

So if the problem is nearly intractable from a product development perspective, what is the answer? Is this as good as it gets?

The Future is Now

I believe we need to stop trying to get rid of the password and instead work with the market inertia to get the password to evolve into a better direction. 

So, does that mean password managers are the answer? I contend that password managers are not the answer but password management is. In other words, password managers as a product market are a dead-end for the same reasons explained above. However, password management as a feature of applications, frameworks and operating systems is the future. Think LastPass versus iCloud Keychain. When your competitor controls the playing field you are competing on you've already lost – you just don't know it yet.

So, what will password management of the future look like? Here's how I think the password will evolve:

  • Passwords of the future will no longer be memorized. They will be very large, completely random and securely generated. A standard will be published on their format and exactly how to properly create them.
  • The user will no longer choose their own password – the service provider will assign it upon account creation.
  • A new web browser Javascript API will be introduced supporting key chain functions. This will allow a website to store an assigned password in your browser's keychain and allow that same website to ask for your password. Access and control will use origin domain protections and user prompts.
  • The web browser will connect to a platform key chain service that will synchronize and keep your passwords available on all of your devices.

You might have had two thoughts while reading that. 1) That sounds a lot like API keys that developers use to access web services, doesn't it? Yes, that's exactly right.

And 2) Doesn't the browser already support client authentication with client SSL certificates? There are some similarities, but the big difference is that the existing client SSL authentication comes with all of the baggage of a PKI infrastructure. The implementation of it is really dragging those problems down into the browser space – and I believe accounts for its limited adoption. The approach here is bubbling up a client authentication solution from passwords.

The User Experience

So, what would this look like to users? Here's two thought experiments for the most common cases:


A user visits a website and fills out a form to signup for service. There's no username or password field. They submit the form and once the account is created, Javascript sent back to the browser invokes the key chain API to add a new account (i.e. save a password for the domain). The user is prompted to give a description for the item. The browser saves it to the platform key chain and the platform key chain synchronizes it with the cloud and all of the user's other devices.


A user visits a website's login page. Javascript on the page detects and invokes the key chain API asking for a password for the domain. The browser presents a window to the user showing the description for each of the stored passwords for the domain. The user selects one which is then returned via the key chain API call to the website's login page Javascript code. The page proceeds to authenticate and sign in the user.

A couple of basic benefits become apparent:

  • A user will never see or be prompted for a password again. Evolving passwords will effectively get rid of them.
  • Every single password created and used this way will be cryptographically unbreakable. The only way to get your password is to steal it from you.
  • The main authentication problem will effectively move from being between service providers and you to being between each of your device's and you. This problem is much easier to solve locally since biometrics become viable. Touch ID anybody?
  • Passwords of this type (large and random) can service to both authorize and identify a user. There's no need to have a separate username to identify an account. The password (or hash of it) can identify an account.

One of my metrics for when an idea is feeling "right" is when it has a real-world analog that is tried and true. In this case, think of locks and keys. You might think login pages are the locks and passwords are the keys. But realistically, the analog for what we have today is really a combination lock. You have to remember, manage and protect the combination. Anybody who can obtain it can open the lock.

In this evolved password scenario, key chains and passwords are created, managed and protected like real keys. A real key isn't likely to be properly duplicated by just looking at it and you aren't likely to guess it. It requires possession of the key (or a duplicate of it) in order to gain access.

In other words, passwords change from something you know to something you possess.

It's Not a Startup Opportunity

If you've read this far, you probably have come to the same conclusion I have – this is not a startup opportunity. This is a problem of standardization and adoption by major application, framework and operating system vendors. Since this is an evolutionary approach instead of a whole new method, it's easy to imagine practical and cost-effective steps that could be taken to get there.

I believe whether or not anybody actively plans this outcome, this is where the problem will ultimately evolve to anyway following the current course. But the description presented here isn't a complete solution as authentication is a large and thorny problem. What about password resets if my key chain loses a key? How do I access websites when I'm not on one of my devices? Feel free to chime in with your thoughts in the comments and add to the discussion.

It's not a problem a startup will come along and magically solve. It's our problem to solve right now.

Durus: The Best Python Object-Oriented Database You've Never Heard Of

I've been developing software in the Python programming language for over 20 years now. It's my preferred language due to it's readability, speed of development and massive number of modules available for it. This blog post is about one of the hidden gems in the Python world: Durus.

If you've built any reasonable sized application before, you've likely worked with a database. The most common database technology available is Structured Query Language (SQL) databases which takes a row and column approach to storing, querying and retrieving data. This technology has been around since the 1970's and has dominated in terms of deployments.

In recent years, new database technologies have been becoming more common and popular. These include key-value, graph, object and document storage systems. Together, this whole trend has been labeled "NoSQL".

Durus actually predated this trend having been developed in 2004 and presented at PyCon 2005. It has it's origins in ZODB which was developed in the late 1990s – Durus took the general architecture of ZODB and simplified it. While it was introduced to the world in the context of web applications, Durus is really more widely applicable than that. It's ACID and can be run standalone or in a client/server architecture for scalability.

If you aren't familiar with object-oriented databases, they are quite a departure from the SQL model. Databases don't consist of tables and rows – they are collections and objects. For instance, in Python, the most common mutable object types are lists, dictionaries and sets. These have direct counterparts in Durus that act and behave like their corresponding type – but are Durus aware.

NoSQL – The Python Way

There's no better way to understand Durus than to see it in action:

$ durus -c --file=test.db
    connection -> the Connection
    root       -> the root instance
>>> root
<PersistentDict 0>
>>> root.items()
>>> root[1] = "a"
>>> root.items()
[(1, 'a')]
>>> connection.commit()
>>> ^D
$ durus -c --file=test.db
    connection -> the Connection
    root       -> the root instance
>>> root.items()
[(1, 'a')]
>>> from durus.persistent_set import PersistentSet
>>> s = PersistentSet()
>>> s.add(1)
>>> root["set"] = s
>>> connection.commit()
>>> root["set"].add(3)
>>> root["set"]
<PersistentSet 32>
>>> list(root["set"])
[1, 3]
>>> connection.abort()
>>> list(root["set"])

Fundamentally, an object-oriented database is based on an object graph. It has a root object that refers to everything else that will be added to the database. The database connection is used to control transaction semantics. Containers, fundamental types and custom classes can all be added to the database as long as the object graph they are a part of is connected to the root object.

There is one "gun meet foot" pitfall to watch for: never put non-Durus aware mutable objects or containers in the database. I.e. you should never add standard Python lists, dictionaries, sets or subclasses of object to the database. The Durus versions are instrumented to properly capture changes in a commit if their contents are modified. Otherwise, you will lose data that you think is being committed to the database.

Why Durus?

Philosophically speaking, object-oriented databases are quite elegant and simple to work with. They use the language runtime environment itself to implement full database functionality without having to learn new semantics (i.e. SQL).

Need a new table? Declare a new Durus persistent class and add a Durus container to the root. Need to do a join? Write a nested for loop across two containers. Need a large-scale indexed container? Use a Durus BTree.

And you get all of this simplicity and elegance in a high performance package. Durus uses an append-only file format on disk (along with a packing utility) and a memory cache which enables large deployments: 10's of GB databases, millions of objects and quick access times.

So, if you got to the end of this post and still haven't installed and played with Durus, what are you waiting for?

Entrepreneurship – Dallas, TX Edition

If you Google "entrepreneurship", you get back this definition:

Entrepreneurship is the willingness to take risks and develop, organize and manage a business venture in a competitive global marketplace that is constantly evolving. Entrepreneurs are pioneers, innovators, leaders and inventors.

While there is a long history of entrepreneurship in certain parts of the world, the trend is picking up worldwide:

We are part of the global entrepreneurial class, an identity that transgresses borders, nationalities, and religion.  Entrepreneurs are a demographic, not a geographic, and their conspicuous creation is driving positive change in our world. Silicon Valley remains a bastion, and a gravitational force.  But the walled gardens are withering, and the access class is becoming an asset class.  Investors and entrepreneurs need a passport to the present.

The Dallas, TX area is no stranger to this phenomenon with organizations and activities like The DEC, Tech Wildcatters, The Garage, VentureSpur Texas, Common Desk, The Grove, BigDOCC, LaunchDFW, Dallas New Tech, DFW Startup Happy Hour, and plenty more. DFW has been no stranger to startup companies in the last 20 years either.

And of course there are local personalities helping driving this trend such as: Bradley Joyce, Mike Sitarzewski, Jennifer Conley, Michael GilbertTrey Bowles, and Gabriella Draney, to name but a few. And my small contribution is mentoring with a couple of the accelerator/incubator programs.

So far, it's a pretty straightforward set of facts repeated in various cities around the world, right? Not so fast, pardner – here in Texas we do everything bigger and better – and entrepreneurship is no different.

Meet Bill McNeely


Bill lives in Dallas and is a veteran of the Afghanistan conflict who, like many other veterans, has struggled since returning to civilian life. Between a moribund economy and the lingering effects of combat, Bill has struggled to support himself and his family. However, he's not just a veteran – he is also an entrepreneur and an active participant in the Dallas entrepreneurial community. This t-shirt he designed sums up his unique perspective:

Bill has been working on startup ideas surrounding the skills he excelled in with the military: logistics. The result is DeliverToMe, a B2B local delivery service. However, how he got to that point is just as important a story about Bill as it is about the Dallas entrepreneurial community.

Bill has received support from The Garage in acquiring a vehicle for his service and building his business model, acquired his first client Brevida, also a startup, through The DEC and received training and mentoring through the Google-sponsored Startup Weekend NEXT program led by Kevin Strawbridge where I helped Bill with his pitch and refined his business model. He also received in-kind support from FISH Technologies, a local award-winning experiential marketing technology company.

The details of how DeliverToMe has developed are much less important than how so many different elements of the Dallas entrepreneurial community spontaneously came together to help Bill. There was no central planning; there were no turf battles. The consistent ingredient was entrepreneurs with a passionate desire to help other entrepreneurs succeed. The difference here is that in Texas, we don't just want each of our own efforts to succeed – we want everybody's entrepreneurial efforts to succeed. That's how entrepreneurship benefits society as a whole and makes it all worthwhile.

What can I say? It's Dallas. That's how we roll.

Get the Boiling Oil Ready

I've blogged before on the topic of computer security and the need for approaches like "asymmetric warfare" to the security problems that our industry – actually our entire society – is experiencing. The recent Target breach is yet another example of how out-of-control the situation is becoming.

I believe we are now on the cusp of a large shift in the corporate and governmental stance on this problem. And this shift may finally begin to turn the tide.

Going On Offense

First you have to understand that as an industry, we've always been in a defensive posture when it comes to cyber attacks. This has been a natural consequence of US law providing no protection for retaliatory responses. Any actions you take against an attacker must not violate the same laws that the attacker violated when they attacked you.

This stance is a purely defensive one – meaning only the US government has the right to retaliate against the hackers, whether that be by legal means or cyber attack. The problem is that the US government doesn't have the resources to effectively track, prosecute and/or retaliate against the hackers. It is not that there are so many hackers; it's that there are so many weak spots for them to attack.

The Internet is like the Wild West where there was one US Marshal for many hundreds of square miles with bands of bandits roaming around. The key difference is back in the Wild West, everybody was armed with weapons to defend themselves. The current state of cyber attacks is that victims get to wear all the body armor they like – but they cannot raise a hand in response.

You cannot win a war if you are always on defense.

An Internet Castle Doctrine

The clear precedent for changing this situation is the concept of self-defense. You can legally take the life of another human being if you do so in defense of your own life. This concept has been around for a very long time and is well tested by and supported by the law and the courts.

In addition to self-defense is the Castle Doctrine. While laws supporting this doctrine do not exist in all states, the concept is pretty simple – the immunity of self-defense is extended to your abode. In other words, your home is treated as your "castle" and you can use lethal force to defend it.

What I believe is needed now is a cyber version of the Castle Doctrine – an "Internet Castle Doctrine". Laws supporting an Internet Castle Doctrine would closely follow the principles of the Castle Doctrine and self-defense. These laws would protect you or your organization if you choose to retaliate against a cyber attack in an offensive fashion.

It seems that most cyber security professionals agree that it is time for this change. Only 30% of IT security leaders were not ready to pursue non-defensive responses to cyber attacks because "too many legal and ethical questions" remain.

Weaponizing Cyber Security

In the same way that the need for self-defense feeds the gun industry, an Internet Castle Doctrine is likely to feed an industry producing cyber self-defense weaponry. CrowdStrike is a startup that came out of stealth mode in 2013 pursuing new approaches to responding to cyber attacks. While they are not offering cyber weaponry yet, they are working on "active defense" systems. There are bound to be more startups quietly working on this too.

Along with the creation of this cyber weapons market will come the inevitable counter arguments that will fuel "cyber weapon control" efforts. As James Lewis, a senior fellow with the Center for Strategic and International Studies put it, enabling counter attacks

Create[s] the risk that some idiot in a company will make a mistake and cause collateral damage that gets us into a war with China.

Yes, collateral damage and unintended consequences are a real concern. However, we have the same concern with guns and self-defense and yet seem to manage well enough. And, at the moment, there doesn't appear to be any other viable alternative to weaponizing and counterattacking.

So, next time the barbarians start attacking your castle walls don't just fill the moat and raise the drawbridge – start thinking about boiling some oil.

Get Rid of Your Safety Net

I just read a remarkable article about the rise and fall of a startup in San Francisco called Everpix. They developed a web-based photo organizing and archiving application. As is the trend, they used a freemium model for the service they offered:

The service seamlessly found and uploaded photos from your desktop and from online services, then organized them using algorithms to highlight the best ones.

As is also the trend, the company was filled with young and talented entrepreneurs. The more complicated explanation about why Everpix failed is understandable:

The founders acknowledge they made mistakes along the way. They spent too much time on the product and not enough time on growth and distribution. The first pitch deck they put together for investors was mediocre. They began marketing too late. They failed to effectively position themselves against giants like Apple and Google, who offer fairly robust — and mostly free — Everpix alternatives. And while the product wasn't particularly difficult to use, it did have a learning curve and required a commitment to entrust an unknown startup with your life's memories — a hard sell that Everpix never got around to making much easier.

At the micro level, that makes a lot of sense. But there is a larger macro effect going on here that they (and the author of the article) touch on without even realizing it:

“You look at all the problems that we’ve had, and it’s still nothing,” he said. “I have more respect for someone who starts a restaurant and puts their life savings into it than what I’ve done. We’re still lucky. We’re in an environment that has a pretty good safety net, in Silicon Valley.”

The main problem in my opinion was not that they didn't execute business strategy correctly – at most that is just a symptom. The real problem is that they operated with a "pretty good safety net".

Working Without a Net

The major problem that many companies fall into when receiving angel or venture funding is that they aren't challenged every moment of every day to do what it takes to survive. If you look at it from a Maslow's hierarchy of needs perspective, the moment the pressure of survival is removed from a fledgling entrepreneurial effort the focus will tend to drift towards longer-term and less relevant aspects of starting a business.

This shift of focus is a death sentence for many companies because at this early stage they still have no inherent ability to survive without the funding – just like a new born infant. If critical care is not taken to make the company self-sustaining at an early stage it will likely not be when the money runs out.

I'm not saying that everybody should be bootstrapping every entrepreneurial effort. Many times, funding enables strategic maneuvers and is the fuel that feeds a "fire" with growth – and with good maneuvering and good growth often comes more stability and longevity.

But if bootstrapping is a viable option, my belief is that the majority of the time, the company that emerges will be a lot stronger and healthier.

So, get rid of unnecessary safety nets around your entrepreneurial efforts and focus on the details of basic survival. Pay close attention to the life blood of your company – its profitabilty – early and often and make sure it is on a trajectory that will make your company self-sustaining as quickly as possible.

As the saying goes "live each day as if it were your last".

Apple Will Be the King of Indoor Location Services

Without saying so, Apple has entered the battlefield of indoor location services. And it appears they are going to win – and win big.

Their first move was when they released the iPhone 4s with Bluetooth Low Energy (LE) support in it. Many observers have been expecting Apple to ship NFC support in their phones for a long time. Not only do they continue to disappoint in that area they instead shipped Bluetooth LE with little fanfare.

Their second move was the acquisition of WiFiSLAM in March of 2013. WiFiSLAM was a small startup company that was doing some amazing work marrying up machine learning with WiFi triangluation and raw sensor input from a mobile device's compass, inertial and gyrosopic sensors. Their work promised to dramatically improve the ability for a mobile device to determine its location indoors.

Then in June of 2013 Apple announced iOS7 and there was a little remarked feature buried in the slides - iBeacons. While official information about iBeacons from Apple is under NDA, some reverse engineering and a lot of speculation has revealed that iBeacons are a protocol enhancement to Bluetooth LE that enables devices that conform to the iBeacon protocol to integrate with iOS Core Location services. The result is that iOS applications are able to detect and respond to events that indicate that the device has moved into and out of a region defined by the iBeacon.

Finally, when Apple announced the iPhone 5s, there was an interesting new chip onboard the phone: the M7 coprocessor. This chip is "designed specifically to measure motion data from the accelerometer, gyroscope, and compass".

Individually, each of these moves seem relatively incremental until you put them altogether in this context:

  • Create inexpensive iBeacons that can be placed indoors that not only define a region but are registered at a particular location
  • Use Bluetooth LE in mobile devices to communicate with them without draining the battery
  • Pull high precision motion data from the M7 coprocessor, again without draining the battery
  • Integrate all of the above using WiFiSLAM's algorithms
  • Make it all simple to integrate into iOS applications.

The result is going to be an amazingly inexpensive, power efficient and simple to use and operate indoor location system. In typical fashion, Apple is addressing the entire ecosystem of the indoor location problem in a very innovative way – the result of which will again be significant market domination.

Rise of the Virtual Machines

UPDATE 2: Since my last update, I discovered Mainframe2 another pretty amazing take on virtualization.

UPDATE: Since I wrote this post, I discovered Docker, another very interesting direction in virtual machines.

"Virtualization" is a term that's used pretty regularly – but exactly what do we mean when we use that term? There's roughly three levels of virtualization in use today:

  1. Hardware and CPU emulation. This is the lowest level of virtualization. Most recent implementations emulate a standard PC architecture and run machine code. Despite the recent growth in this area (VMWare, Parallels), this type of virtualization actually dates back to 1972 when IBM released VM/370 for mainframes.
  2. Byte code emulation. This higher level emulation implements an abstract computing environment (registers, memory, instructions) only – there is no true hardware emulation. This type of VM assumes it is running in the context of an operating system that provides all of the facilities hardware peripherals would normally provide. The Java VM, Microsoft's CLR, Python, Ruby and Perl are all examples of this type of VM.
  3. Sandboxing. This high level virtualization modifies the behavior of operating system calls to implement a container ("sandbox") in which normal applications become isolated from other applications and system resources as a matter of policy. Some examples are Apple's App SandboxiOS App Sandbox and Linux Containers.

What all three of these techniques have in common is that they mediate access between an application (or operating system) and it's execution environment. The goals are to increase security, manage resources more reliability and efficiently, and to simplify deployments. Theses benefits are behind the rapid rise of hardware virtualization over the last 5 years.

What's interesting is the parallels between virtualization and the web. A virtualization instance (or virtual machine – VM) creates a virtual environment for each application/operating system to operate within for both security and efficiency reasons. Web browsers also do the same thing with Javascript – each web page has it's own execution environment. You could call it level 2.5 virtualization as it it shares aspects of level 2 and 3 virtualization.

Virtualization can be a mind bending exercise – especially when you start looking at things like JSLinux. JSLinux is a hardware VM implemented in Javascript that runs inside of a web page. The demo is pretty amazing – it boots a relatively stock Linux kernel. The mind bending part is when you realize this is a level 1 VM implemented inside a level 2.5 VM. Technically, you should be able to run a web browser inside of JSLinux and launch yet another nested VM instance.

The Blue Pill

Where is all of this going? With almost 4 different types of VMs and proofs of concept that intermix them in reality altering ways (The Matrix anyone?) it seems we haven't reached the apex of this trend yet.

One path this could follow is Arc. Arc takes the browser in a slightly different direction from JSLinux. Arc takes Virtual Box and packages it into a browser plugin which is then combined with a specification for specing web downloadable virtual machines. This makes the following possible: install the Arc plugin into your browser, visit a web page with an Arc VM on it and the Arc plugin downloads the spec, assembles and launches the VM and you wind up securely running a native application via the web.

In other words, visiting a web page today could turn into launching a virtualized application tomorrow.

While there are clear efficiency and security benefits to this, there's also a huge developer benefit: developers would be able to implement web applications in almost any conceivable fashion. They can choose the operating system, the GUI toolkit and anything else they like as the basis of their application. The situation where the web tends to be the lowest common denominator gets turned on its head. Developers are now freed to develop with whatever tools provide the best experience and value to the end-user.

This fanciful scenario implies a possible future trend: the death of standards. Web standards exist and are adopted because they benefit developers – they create a homogenous ecosystem which is easier to write for and deploy into. But, if virtualization takes us to the point there the characteristics of the execution environment are so isolated from the application that they have no impact on it, why bother with standards?

If this outcome comes about, the irony will be that the Java VM shot too low when it first arrived on the scene claiming "write once, run anywhere". Maybe we are really going to end up with "create anything, run anywhere".

The Economics of RFID Performance

I've built many types of RFID applications in the past: passive/active, presence/area/position, HF/UHF/UWB. Regardless of the technology and the application, there's always been a disconnect between a users' concept of how RFID should work versus the real-world behavior.

The basic concept of RFID is that you can identify an object at a distance using some sort of wireless protocol. There are many different technologies and techniques for doing this but they all fundamentally rely on some sort of RF communication protocol.

I've written previously about the issues with unlicensed RF communications. The unstated implication is that if you could see RF communications the way we can see visible light, we would be stunned by the storm of RF constantly swirling around us. It would be like the inside of a dance club with disco balls, lasers and colored lights shining everywhere.

But even in "quiet" environments, RF communications still suffer significantly from fundamental limitations of physics. For instance, water absorbs most RF signals – and the human body is the most common mobile container of water when it comes to RFID applications. If the line of sight between a WiFi base station and a WiFi device is blocked by more than 3 bodies, the signal will be completely lost. RF signals are also affected by metal which can reflect, absorb and even redirect signals.

As a result, performance can be unpredictable in even the simplest deployments.

The root of the disconnect I referred to earlier is that end-users don't perceive these complexities because they aren't visible to the naked eye – you have to visualize the situation mentally to understand what is really going on. Their naive (but rational) assumption is that RFID should just work – and work reliably.

While you could spend a lot of energy educating end-users about these environmental complexities, you are probably better off framing the entire issue in economic terms which can be summed up in the following chart:

What this chart is saying is that most RFID systems (and applications) have to make a tradeoff between cost and performance. The trade off is made such that, on average, you get a reasonable level of performance for a fixed cost. Many times, "on average" will be something like 75% of the time and "reasonable" performance level will be 95% accuracy in reads.

So, generally speaking, a fixed investment in equipment gets you a high (but not perfect) level of performance. Within that fixed cost you can tune things, rearrange equipment and application parameters and take other steps to linearly improve performance by say 1%. Once you reach the limit of those techniques, you can then begin to do things like add redundant equipment to the setup for another linear increase in performance by say another 1% – but with a faster increase in cost.

Now you are at the stage where actions are best described as "heroic" and costs begin to rise exponentially. For instance, you begin to look into building custom antennas, boosting system powers beyond regulatory limits and hand-selecting RFID tags for their individual performance characteristics. Yet all of this might get you another 1% linear increase in performance.

And therein lies the lesson: there is no cost-effective way to get 100% accuracy out of an RFID system.

So take my advice and start RFID projects off with the graph above and a lesson in economics. It will save you and your customer a lot of grief.

iPhone Battery Drain, dataaccessd, and Calendar.sqlitedb, Oh My!


I have an iPhone 4S and recently upgraded to iOS 6.0.1. My battery life had been middling before, but since moving to iOS 6, it had dropped dramatically. I now cannot make it through two-thirds of the day without putting my phone on a charger. I finally reached my limit yesterday and decided to figure out what was going on.

I started my hunt with some googling: ios 6 battery drain

Most of the hits are generic types of things to try. Tweak settings, reboot, restore, reinstall apps, and the dreaded, wipe and setup from scratch. I was pretty sure that last option would fix my problem but I didn't want to lose all of my apps' data and have to go through a long setup process. Being an iOS developer, I decided to peek behind the curtain and see what I could figure out.

I launched Xcode (Apple's developer environment) and then Instruments (Apple's performance monitoring tool). I connected my iPhone and started Instruments with the iOS Activity Monitor template:

This collects in real-time data about the processes running on your iOS device. The information is a lot like Mac OS X's Activity Monitor and looks like this:

My phone seemed relatively idle ("DTMobileIS" is the process that feeds data to Instruments so ignore it). But one thing I noticed is that the process "dataaccessd" had an enormous amount of CPU Time racked up. It was order of magnitudes higher than any other process. So, back to Google: dataaccessd ios battery drain

Now, we were getting somewhere – dataaccessd has been fingered before as a cause of battery issues. So, I investigated some of the hits and and came upon this Open Radar bug:

dataaccessd continuously crashing, draining battery.

With high total CPU time for my dataaccessd, it clearly wasn't crashing. However, this did ring a bell for me – I've had issues with Calendar. It tends to be very sluggish. I started playing around with Calendar while watching the Activity Monitor. 

What I found was startling. After launching Calendar, switching back and forth between a couple of the days in the month view and then locking the phone, the dataaccessd process would eat the CPU for close to a minute before settling down. I could reproduce this on demand with simple interactions with Calendar.

In an attempt to figure out what dataaccessd was doing, I used the Network template in Instruments:

What's nice about Instruments is that you can run this second template at the same time with the Activity Monitor template. When you focus on the dataaccessd process and drill down into the connections, it looks something like this:

I now recreated the problem and what appeared here was a ton of Data In and Data Out activity by dataaccessd. It was all on localhost so I presumed that what we were talking about was ultimately file I/O. 

So we are at the point where messing with Calendar causes dataaccessd to do a whole bunch of file I/O. If this happens whenever Calendar does anything (like handling Push from iCloud or firing off event alarms), I felt it is the likely cause of my battery issues.

Unfortunately, this is about as far as Apple's Developer tools will take you. You really need to be able to trace dataaccessd itself to figure out what it is doing. Instruments does have a template for this, but you can only run it on applications that can be launched. Long-term system processes like dataaccessd cannot be attached to. The inability to do this is also probably a result of Apple not wanting people poking around in the internal guts of a system process like dataaccessd.

With a little more Googling, you end up finding out that Calendar stores it's data in Library/Calendar/Calendar.sqlitedb. Apple doesn't allow you to access this file directly on the device but there's another way to get to it – through a device backup. 

My phone is set to backup over WiFi to iCloud, but if you right-click on the device in iTunes you will see an option to force a local backup. Once you do that, you can access your backup with iBackupBot, a cool tool that knows how to access and interpret your device backups. I found Calendar.sqlitedb and extracted it to my Desktop.

The first thing I noticed is that the file was close to 73MB in size! That correlated to the amount of I/O that dataaccessd appeared to be performing according to the Network template in Instruments. If dataaccessd is having to rewrite that file regularly, no wonder it's eating the CPU (and my battery).

I now decided to get into the database itself and check it out. I started Terminal, changed directory to where Calendar.sqlitedb was and started up sqlite3 to inspect it. Running .tables looks like this:

$ sqlite3 Calendar.sqlitedb
sqlite> .tables
Alarm                      Location                 
AlarmChanges               Notification             
Attachment                 NotificationChanges      
AttachmentChanges          OccurrenceCache          
Calendar                   OccurrenceCacheDays      
CalendarChanges            Participant              
CalendarItem               ParticipantChanges       
CalendarItemChanges        Recurrence               
Category                   RecurrenceChanges        
CategoryLink               ResourceChange           
EventAction                Sharee                   
EventActionChanges         ShareeChanges            
ExceptionDate              Store                    
Identity                   _SqliteDatabaseProperties

So, how do I figure out which table is the problem? I started by figuring out the sizes of each table:

sqlite> select count() from Alarm;

I did this for every table in the database until I found the culprit:

sqlite> select count() from Participant;

Well that doesn't seem right! The Participant table was multiple orders of magnitude larger than any other table. Now, I started looking at the data in that table:

sqlite> select * from Participant limit 100;

There are a *bunch* of participants on events in my calendar for events that appear to have originated on Google (I guess either because I subscribed to a calendar there or received them in email and accepted them on to my calendar). We need to look at the schema a little bit to figure out what is going on:

sqlite> .schema Participant
CREATE TABLE Participant (ROWID INTEGER PRIMARY KEY AUTOINCREMENT, entity_type INTEGER, type INTEGER, status INTEGER, pending_status INTEGER, role INTEGER, identity_id INTEGER, owner_id INTEGER, external_rep BLOB, UUID TEXT, email TEXT, is_self INTEGER);
CREATE INDEX ParticipantEntityType on Participant(entity_type);
CREATE INDEX ParticipantOwnerId on Participant(owner_id);
CREATE INDEX ParticipantUUID on Participant(UUID);

It appears that owner_id points to a row in CalendarItem that is the owner event for each Participant. So, we try to narrow things down to see what is going on:

sqlite> select * from Participant where owner_id=137500 limit 100;

Why in the world does one CalendarItem have so many copies of the same Participant on it? And how many times exactly you might wonder?

sqlite> select count() from Participant where owner_id=137500;

Whoa! Clearly there was a bug at work here. There were small counts of Participant rows with emails without "mailto:" so I figured that must have been the root of the problem. My best guess is it was fixed somewhere by somebody at some point but no code was ever written to clean up the mess it left behind in my Calendar.sqlitedb.

So, now what do we do about it? Again, a wipe and reset of the phone would probably fix this. But I wasn't interested in wasting time on that. So, I decided to try an experiment. I again backed up my phone and then made a zip archive of the backup directory in ~/Library/Application Support/MobileSync/Backup just in case things went horribly wrong. I then used iBackupBot to again extract Calendar.sqlitedb and started sqlite3 on it. I then took a chance and tried to get rid of the junk participants.

sqlite> delete from Participant where email like "mailto:%";
[...crunching away for a few seconds...]

I then exited sqlite3 and now the size of Calendar.sqlitedb was just 2.2 MB! This was looking promising. I imported Calendar.sqlitedb back into my backup using iBackupBot and restored my phone from this backup.

This is where things got a little scary. iTunes restored the phone and it started rebooting – and then powered off. I powered it on and it powered itself off within about 10 seconds. I powered it on again and the same thing. At this point, I'm thinking "oh well, going to have to do a full recovery restore back to where we started" and powered the phone on again, prepared to put it into Recovery Mode. But I gave it another chance and to my surprise it finished booting!

I got Instruments running again with the Activity Monitor template and unlocked the phone. I interacted with Calendar and watched the effect. Calendar is now nice and snappy and dataaccessd runs for just a couple of seconds and then goes idle.

It's still early after this adventure so I'm not 100% positive this fixed my battery drain yet but the early indications are promising. Calendar is nice and snappy and dataaccessd no longer eats the CPU. The battery life feels subjectively better so far; it will take a couple of days to really get a sense for the change. And I haven't fully exercised Calendar to see if I haven't borked it somewhere with my database hacking but so far so good.

If anybody at Apple ends up reading this, I've kept a copy of that backup with the original Calendar.sqlitedb in case somebody wants to perform forensics on it. Also, I recommend writing a Calendar.sqlitedb "fsck" type of utility and adding it under the covers to the OS update process in order to keep this cruft at bay. You just might see the battery complaints die down.

UPDATE: My battery life has dramatically improved. Before, I had to start recharging it about 2/3's of the way through the day. Now, I can go an entire day with moderate usage and still have about 1/3 power remaining.