Exploring the nooks and crannies of technology

 

Durus: The Best Python Object-Oriented Database You've Never Heard Of

I've been developing software in the Python programming language for over 20 years now. It's my preferred language due to it's readability, speed of development and massive number of modules available for it. This blog post is about one of the hidden gems in the Python world: Durus.

If you've built any reasonable sized application before, you've likely worked with a database. The most common database technology available is Structured Query Language (SQL) databases which takes a row and column approach to storing, querying and retrieving data. This technology has been around since the 1970's and has dominated in terms of deployments.

In recent years, new database technologies have been becoming more common and popular. These include key-value, graph, object and document storage systems. Together, this whole trend has been labeled "NoSQL".

Durus actually predated this trend having been developed in 2004 and presented at PyCon 2005. It has it's origins in ZODB which was developed in the late 1990s – Durus took the general architecture of ZODB and simplified it. While it was introduced to the world in the context of web applications, Durus is really more widely applicable than that. It's ACID and can be run standalone or in a client/server architecture for scalability.

If you aren't familiar with object-oriented databases, they are quite a departure from the SQL model. Databases don't consist of tables and rows – they are collections and objects. For instance, in Python, the most common mutable object types are lists, dictionaries and sets. These have direct counterparts in Durus that act and behave like their corresponding type – but are Durus aware.

NoSQL – The Python Way

There's no better way to understand Durus than to see it in action:

$ durus -c --file=test.db
Durus 127.0.0.1:2972
    connection -> the Connection
    root       -> the root instance
>>> root
<PersistentDict 0>
>>> root.items()
[]
>>> root[1] = "a"
>>> root.items()
[(1, 'a')]
>>> connection.commit()
>>> ^D
$ durus -c --file=test.db
Durus 127.0.0.1:2972
    connection -> the Connection
    root       -> the root instance
>>> root.items()
[(1, 'a')]
>>> from durus.persistent_set import PersistentSet
>>> s = PersistentSet()
>>> s.add(1)
>>> root["set"] = s
>>> connection.commit()
>>> root["set"].add(3)
>>> root["set"]
<PersistentSet 32>
>>> list(root["set"])
[1, 3]
>>> connection.abort()
>>> list(root["set"])
[1]
>>> 

Fundamentally, an object-oriented database is based on an object graph. It has a root object that refers to everything else that will be added to the database. The database connection is used to control transaction semantics. Containers, fundamental types and custom classes can all be added to the database as long as the object graph they are a part of is connected to the root object.

There is one "gun meet foot" pitfall to watch for: never put non-Durus aware mutable objects or containers in the database. I.e. you should never add standard Python lists, dictionaries, sets or subclasses of object to the database. The Durus versions are instrumented to properly capture changes in a commit if their contents are modified. Otherwise, you will lose data that you think is being committed to the database.

Why Durus?

Philosophically speaking, object-oriented databases are quite elegant and simple to work with. They use the language runtime environment itself to implement full database functionality without having to learn new semantics (i.e. SQL).

Need a new table? Declare a new Durus persistent class and add a Durus container to the root. Need to do a join? Write a nested for loop across two containers. Need a large-scale indexed container? Use a Durus BTree.

And you get all of this simplicity and elegance in a high performance package. Durus uses an append-only file format on disk (along with a packing utility) and a memory cache which enables large deployments: 10's of GB databases, millions of objects and quick access times.

So, if you got to the end of this post and still haven't installed and played with Durus, what are you waiting for?

Entrepreneurship – Dallas, TX Edition

If you Google "entrepreneurship", you get back this definition:

Entrepreneurship is the willingness to take risks and develop, organize and manage a business venture in a competitive global marketplace that is constantly evolving. Entrepreneurs are pioneers, innovators, leaders and inventors.

While there is a long history of entrepreneurship in certain parts of the world, the trend is picking up worldwide:

We are part of the global entrepreneurial class, an identity that transgresses borders, nationalities, and religion.  Entrepreneurs are a demographic, not a geographic, and their conspicuous creation is driving positive change in our world. Silicon Valley remains a bastion, and a gravitational force.  But the walled gardens are withering, and the access class is becoming an asset class.  Investors and entrepreneurs need a passport to the present.

The Dallas, TX area is no stranger to this phenomenon with organizations and activities like The DEC, Tech Wildcatters, The Garage, VentureSpur Texas, Common Desk, The Grove, BigDOCC, LaunchDFW, Dallas New Tech, DFW Startup Happy Hour, and plenty more. DFW has been no stranger to startup companies in the last 20 years either.

And of course there are local personalities helping driving this trend such as: Bradley Joyce, Mike Sitarzewski, Jennifer Conley, Michael GilbertTrey Bowles, and Gabriella Draney, to name but a few. And my small contribution is mentoring with a couple of the accelerator/incubator programs.

So far, it's a pretty straightforward set of facts repeated in various cities around the world, right? Not so fast, pardner – here in Texas we do everything bigger and better – and entrepreneurship is no different.

Meet Bill McNeely

BmUzG_WCQAA5Ygp.jpg

Bill lives in Dallas and is a veteran of the Afghanistan conflict who, like many other veterans, has struggled since returning to civilian life. Between a moribund economy and the lingering effects of combat, Bill has struggled to support himself and his family. However, he's not just a veteran – he is also an entrepreneur and an active participant in the Dallas entrepreneurial community. This t-shirt he designed sums up his unique perspective:

Bill has been working on startup ideas surrounding the skills he excelled in with the military: logistics. The result is DeliverToMe, a B2B local delivery service. However, how he got to that point is just as important a story about Bill as it is about the Dallas entrepreneurial community.

Bill has received support from The Garage in acquiring a vehicle for his service and building his business model, acquired his first client Brevida, also a startup, through The DEC and received training and mentoring through the Google-sponsored Startup Weekend NEXT program led by Kevin Strawbridge where I helped Bill with his pitch and refined his business model. He also received in-kind support from FISH Technologies, a local award-winning experiential marketing technology company.

The details of how DeliverToMe has developed are much less important than how so many different elements of the Dallas entrepreneurial community spontaneously came together to help Bill. There was no central planning; there were no turf battles. The consistent ingredient was entrepreneurs with a passionate desire to help other entrepreneurs succeed. The difference here is that in Texas, we don't just want each of our own efforts to succeed – we want everybody's entrepreneurial efforts to succeed. That's how entrepreneurship benefits society as a whole and makes it all worthwhile.

What can I say? It's Dallas. That's how we roll.

Get the Boiling Oil Ready

I've blogged before on the topic of computer security and the need for approaches like "asymmetric warfare" to the security problems that our industry – actually our entire society – is experiencing. The recent Target breach is yet another example of how out-of-control the situation is becoming.

I believe we are now on the cusp of a large shift in the corporate and governmental stance on this problem. And this shift may finally begin to turn the tide.

Going On Offense

First you have to understand that as an industry, we've always been in a defensive posture when it comes to cyber attacks. This has been a natural consequence of US law providing no protection for retaliatory responses. Any actions you take against an attacker must not violate the same laws that the attacker violated when they attacked you.

This stance is a purely defensive one – meaning only the US government has the right to retaliate against the hackers, whether that be by legal means or cyber attack. The problem is that the US government doesn't have the resources to effectively track, prosecute and/or retaliate against the hackers. It is not that there are so many hackers; it's that there are so many weak spots for them to attack.

The Internet is like the Wild West where there was one US Marshal for many hundreds of square miles with bands of bandits roaming around. The key difference is back in the Wild West, everybody was armed with weapons to defend themselves. The current state of cyber attacks is that victims get to wear all the body armor they like – but they cannot raise a hand in response.

You cannot win a war if you are always on defense.

An Internet Castle Doctrine

The clear precedent for changing this situation is the concept of self-defense. You can legally take the life of another human being if you do so in defense of your own life. This concept has been around for a very long time and is well tested by and supported by the law and the courts.

In addition to self-defense is the Castle Doctrine. While laws supporting this doctrine do not exist in all states, the concept is pretty simple – the immunity of self-defense is extended to your abode. In other words, your home is treated as your "castle" and you can use lethal force to defend it.

What I believe is needed now is a cyber version of the Castle Doctrine – an "Internet Castle Doctrine". Laws supporting an Internet Castle Doctrine would closely follow the principles of the Castle Doctrine and self-defense. These laws would protect you or your organization if you choose to retaliate against a cyber attack in an offensive fashion.

It seems that most cyber security professionals agree that it is time for this change. Only 30% of IT security leaders were not ready to pursue non-defensive responses to cyber attacks because "too many legal and ethical questions" remain.

Weaponizing Cyber Security

In the same way that the need for self-defense feeds the gun industry, an Internet Castle Doctrine is likely to feed an industry producing cyber self-defense weaponry. CrowdStrike is a startup that came out of stealth mode in 2013 pursuing new approaches to responding to cyber attacks. While they are not offering cyber weaponry yet, they are working on "active defense" systems. There are bound to be more startups quietly working on this too.

Along with the creation of this cyber weapons market will come the inevitable counter arguments that will fuel "cyber weapon control" efforts. As James Lewis, a senior fellow with the Center for Strategic and International Studies put it, enabling counter attacks

Create[s] the risk that some idiot in a company will make a mistake and cause collateral damage that gets us into a war with China.

Yes, collateral damage and unintended consequences are a real concern. However, we have the same concern with guns and self-defense and yet seem to manage well enough. And, at the moment, there doesn't appear to be any other viable alternative to weaponizing and counterattacking.

So, next time the barbarians start attacking your castle walls don't just fill the moat and raise the drawbridge – start thinking about boiling some oil.

Get Rid of Your Safety Net

I just read a remarkable article about the rise and fall of a startup in San Francisco called Everpix. They developed a web-based photo organizing and archiving application. As is the trend, they used a freemium model for the service they offered:

The service seamlessly found and uploaded photos from your desktop and from online services, then organized them using algorithms to highlight the best ones.

As is also the trend, the company was filled with young and talented entrepreneurs. The more complicated explanation about why Everpix failed is understandable:

The founders acknowledge they made mistakes along the way. They spent too much time on the product and not enough time on growth and distribution. The first pitch deck they put together for investors was mediocre. They began marketing too late. They failed to effectively position themselves against giants like Apple and Google, who offer fairly robust — and mostly free — Everpix alternatives. And while the product wasn't particularly difficult to use, it did have a learning curve and required a commitment to entrust an unknown startup with your life's memories — a hard sell that Everpix never got around to making much easier.

At the micro level, that makes a lot of sense. But there is a larger macro effect going on here that they (and the author of the article) touch on without even realizing it:

“You look at all the problems that we’ve had, and it’s still nothing,” he said. “I have more respect for someone who starts a restaurant and puts their life savings into it than what I’ve done. We’re still lucky. We’re in an environment that has a pretty good safety net, in Silicon Valley.”

The main problem in my opinion was not that they didn't execute business strategy correctly – at most that is just a symptom. The real problem is that they operated with a "pretty good safety net".

Working Without a Net

The major problem that many companies fall into when receiving angel or venture funding is that they aren't challenged every moment of every day to do what it takes to survive. If you look at it from a Maslow's hierarchy of needs perspective, the moment the pressure of survival is removed from a fledgling entrepreneurial effort the focus will tend to drift towards longer-term and less relevant aspects of starting a business.

This shift of focus is a death sentence for many companies because at this early stage they still have no inherent ability to survive without the funding – just like a new born infant. If critical care is not taken to make the company self-sustaining at an early stage it will likely not be when the money runs out.

I'm not saying that everybody should be bootstrapping every entrepreneurial effort. Many times, funding enables strategic maneuvers and is the fuel that feeds a "fire" with growth – and with good maneuvering and good growth often comes more stability and longevity.

But if bootstrapping is a viable option, my belief is that the majority of the time, the company that emerges will be a lot stronger and healthier.

So, get rid of unnecessary safety nets around your entrepreneurial efforts and focus on the details of basic survival. Pay close attention to the life blood of your company – its profitabilty – early and often and make sure it is on a trajectory that will make your company self-sustaining as quickly as possible.

As the saying goes "live each day as if it were your last".

Apple Will Be the King of Indoor Location Services

Without saying so, Apple has entered the battlefield of indoor location services. And it appears they are going to win – and win big.

Their first move was when they released the iPhone 4s with Bluetooth Low Energy (LE) support in it. Many observers have been expecting Apple to ship NFC support in their phones for a long time. Not only do they continue to disappoint in that area they instead shipped Bluetooth LE with little fanfare.

Their second move was the acquisition of WiFiSLAM in March of 2013. WiFiSLAM was a small startup company that was doing some amazing work marrying up machine learning with WiFi triangluation and raw sensor input from a mobile device's compass, inertial and gyrosopic sensors. Their work promised to dramatically improve the ability for a mobile device to determine its location indoors.

Then in June of 2013 Apple announced iOS7 and there was a little remarked feature buried in the slides - iBeacons. While official information about iBeacons from Apple is under NDA, some reverse engineering and a lot of speculation has revealed that iBeacons are a protocol enhancement to Bluetooth LE that enables devices that conform to the iBeacon protocol to integrate with iOS Core Location services. The result is that iOS applications are able to detect and respond to events that indicate that the device has moved into and out of a region defined by the iBeacon.

Finally, when Apple announced the iPhone 5s, there was an interesting new chip onboard the phone: the M7 coprocessor. This chip is "designed specifically to measure motion data from the accelerometer, gyroscope, and compass".

Individually, each of these moves seem relatively incremental until you put them altogether in this context:

  • Create inexpensive iBeacons that can be placed indoors that not only define a region but are registered at a particular location
  • Use Bluetooth LE in mobile devices to communicate with them without draining the battery
  • Pull high precision motion data from the M7 coprocessor, again without draining the battery
  • Integrate all of the above using WiFiSLAM's algorithms
  • Make it all simple to integrate into iOS applications.

The result is going to be an amazingly inexpensive, power efficient and simple to use and operate indoor location system. In typical fashion, Apple is addressing the entire ecosystem of the indoor location problem in a very innovative way – the result of which will again be significant market domination.

Rise of the Virtual Machines

UPDATE 2: Since my last update, I discovered Mainframe2 another pretty amazing take on virtualization.

UPDATE: Since I wrote this post, I discovered Docker, another very interesting direction in virtual machines.

"Virtualization" is a term that's used pretty regularly – but exactly what do we mean when we use that term? There's roughly three levels of virtualization in use today:

  1. Hardware and CPU emulation. This is the lowest level of virtualization. Most recent implementations emulate a standard PC architecture and run machine code. Despite the recent growth in this area (VMWare, Parallels), this type of virtualization actually dates back to 1972 when IBM released VM/370 for mainframes.
  2. Byte code emulation. This higher level emulation implements an abstract computing environment (registers, memory, instructions) only – there is no true hardware emulation. This type of VM assumes it is running in the context of an operating system that provides all of the facilities hardware peripherals would normally provide. The Java VM, Microsoft's CLR, Python, Ruby and Perl are all examples of this type of VM.
  3. Sandboxing. This high level virtualization modifies the behavior of operating system calls to implement a container ("sandbox") in which normal applications become isolated from other applications and system resources as a matter of policy. Some examples are Apple's App SandboxiOS App Sandbox and Linux Containers.

What all three of these techniques have in common is that they mediate access between an application (or operating system) and it's execution environment. The goals are to increase security, manage resources more reliability and efficiently, and to simplify deployments. Theses benefits are behind the rapid rise of hardware virtualization over the last 5 years.

What's interesting is the parallels between virtualization and the web. A virtualization instance (or virtual machine – VM) creates a virtual environment for each application/operating system to operate within for both security and efficiency reasons. Web browsers also do the same thing with Javascript – each web page has it's own execution environment. You could call it level 2.5 virtualization as it it shares aspects of level 2 and 3 virtualization.

Virtualization can be a mind bending exercise – especially when you start looking at things like JSLinux. JSLinux is a hardware VM implemented in Javascript that runs inside of a web page. The demo is pretty amazing – it boots a relatively stock Linux kernel. The mind bending part is when you realize this is a level 1 VM implemented inside a level 2.5 VM. Technically, you should be able to run a web browser inside of JSLinux and launch yet another nested VM instance.

The Blue Pill

Where is all of this going? With almost 4 different types of VMs and proofs of concept that intermix them in reality altering ways (The Matrix anyone?) it seems we haven't reached the apex of this trend yet.

One path this could follow is Arc. Arc takes the browser in a slightly different direction from JSLinux. Arc takes Virtual Box and packages it into a browser plugin which is then combined with a specification for specing web downloadable virtual machines. This makes the following possible: install the Arc plugin into your browser, visit a web page with an Arc VM on it and the Arc plugin downloads the spec, assembles and launches the VM and you wind up securely running a native application via the web.

In other words, visiting a web page today could turn into launching a virtualized application tomorrow.

While there are clear efficiency and security benefits to this, there's also a huge developer benefit: developers would be able to implement web applications in almost any conceivable fashion. They can choose the operating system, the GUI toolkit and anything else they like as the basis of their application. The situation where the web tends to be the lowest common denominator gets turned on its head. Developers are now freed to develop with whatever tools provide the best experience and value to the end-user.

This fanciful scenario implies a possible future trend: the death of standards. Web standards exist and are adopted because they benefit developers – they create a homogenous ecosystem which is easier to write for and deploy into. But, if virtualization takes us to the point there the characteristics of the execution environment are so isolated from the application that they have no impact on it, why bother with standards?

If this outcome comes about, the irony will be that the Java VM shot too low when it first arrived on the scene claiming "write once, run anywhere". Maybe we are really going to end up with "create anything, run anywhere".

The Economics of RFID Performance

I've built many types of RFID applications in the past: passive/active, presence/area/position, HF/UHF/UWB. Regardless of the technology and the application, there's always been a disconnect between a users' concept of how RFID should work versus the real-world behavior.

The basic concept of RFID is that you can identify an object at a distance using some sort of wireless protocol. There are many different technologies and techniques for doing this but they all fundamentally rely on some sort of RF communication protocol.

I've written previously about the issues with unlicensed RF communications. The unstated implication is that if you could see RF communications the way we can see visible light, we would be stunned by the storm of RF constantly swirling around us. It would be like the inside of a dance club with disco balls, lasers and colored lights shining everywhere.

But even in "quiet" environments, RF communications still suffer significantly from fundamental limitations of physics. For instance, water absorbs most RF signals – and the human body is the most common mobile container of water when it comes to RFID applications. If the line of sight between a WiFi base station and a WiFi device is blocked by more than 3 bodies, the signal will be completely lost. RF signals are also affected by metal which can reflect, absorb and even redirect signals.

As a result, performance can be unpredictable in even the simplest deployments.

The root of the disconnect I referred to earlier is that end-users don't perceive these complexities because they aren't visible to the naked eye – you have to visualize the situation mentally to understand what is really going on. Their naive (but rational) assumption is that RFID should just work – and work reliably.

While you could spend a lot of energy educating end-users about these environmental complexities, you are probably better off framing the entire issue in economic terms which can be summed up in the following chart:

What this chart is saying is that most RFID systems (and applications) have to make a tradeoff between cost and performance. The trade off is made such that, on average, you get a reasonable level of performance for a fixed cost. Many times, "on average" will be something like 75% of the time and "reasonable" performance level will be 95% accuracy in reads.

So, generally speaking, a fixed investment in equipment gets you a high (but not perfect) level of performance. Within that fixed cost you can tune things, rearrange equipment and application parameters and take other steps to linearly improve performance by say 1%. Once you reach the limit of those techniques, you can then begin to do things like add redundant equipment to the setup for another linear increase in performance by say another 1% – but with a faster increase in cost.

Now you are at the stage where actions are best described as "heroic" and costs begin to rise exponentially. For instance, you begin to look into building custom antennas, boosting system powers beyond regulatory limits and hand-selecting RFID tags for their individual performance characteristics. Yet all of this might get you another 1% linear increase in performance.

And therein lies the lesson: there is no cost-effective way to get 100% accuracy out of an RFID system.

So take my advice and start RFID projects off with the graph above and a lesson in economics. It will save you and your customer a lot of grief.

iPhone Battery Drain, dataaccessd, and Calendar.sqlitedb, Oh My!

UPDATE AT THE BOTTOM

I have an iPhone 4S and recently upgraded to iOS 6.0.1. My battery life had been middling before, but since moving to iOS 6, it had dropped dramatically. I now cannot make it through two-thirds of the day without putting my phone on a charger. I finally reached my limit yesterday and decided to figure out what was going on.

I started my hunt with some googling: ios 6 battery drain

Most of the hits are generic types of things to try. Tweak settings, reboot, restore, reinstall apps, and the dreaded, wipe and setup from scratch. I was pretty sure that last option would fix my problem but I didn't want to lose all of my apps' data and have to go through a long setup process. Being an iOS developer, I decided to peek behind the curtain and see what I could figure out.

I launched Xcode (Apple's developer environment) and then Instruments (Apple's performance monitoring tool). I connected my iPhone and started Instruments with the iOS Activity Monitor template:

This collects in real-time data about the processes running on your iOS device. The information is a lot like Mac OS X's Activity Monitor and looks like this:

My phone seemed relatively idle ("DTMobileIS" is the process that feeds data to Instruments so ignore it). But one thing I noticed is that the process "dataaccessd" had an enormous amount of CPU Time racked up. It was order of magnitudes higher than any other process. So, back to Google: dataaccessd ios battery drain

Now, we were getting somewhere – dataaccessd has been fingered before as a cause of battery issues. So, I investigated some of the hits and and came upon this Open Radar bug:

dataaccessd continuously crashing, draining battery.

With high total CPU time for my dataaccessd, it clearly wasn't crashing. However, this did ring a bell for me – I've had issues with Calendar. It tends to be very sluggish. I started playing around with Calendar while watching the Activity Monitor. 

What I found was startling. After launching Calendar, switching back and forth between a couple of the days in the month view and then locking the phone, the dataaccessd process would eat the CPU for close to a minute before settling down. I could reproduce this on demand with simple interactions with Calendar.

In an attempt to figure out what dataaccessd was doing, I used the Network template in Instruments:

What's nice about Instruments is that you can run this second template at the same time with the Activity Monitor template. When you focus on the dataaccessd process and drill down into the connections, it looks something like this:

I now recreated the problem and what appeared here was a ton of Data In and Data Out activity by dataaccessd. It was all on localhost so I presumed that what we were talking about was ultimately file I/O. 

So we are at the point where messing with Calendar causes dataaccessd to do a whole bunch of file I/O. If this happens whenever Calendar does anything (like handling Push from iCloud or firing off event alarms), I felt it is the likely cause of my battery issues.

Unfortunately, this is about as far as Apple's Developer tools will take you. You really need to be able to trace dataaccessd itself to figure out what it is doing. Instruments does have a template for this, but you can only run it on applications that can be launched. Long-term system processes like dataaccessd cannot be attached to. The inability to do this is also probably a result of Apple not wanting people poking around in the internal guts of a system process like dataaccessd.

With a little more Googling, you end up finding out that Calendar stores it's data in Library/Calendar/Calendar.sqlitedb. Apple doesn't allow you to access this file directly on the device but there's another way to get to it – through a device backup. 

My phone is set to backup over WiFi to iCloud, but if you right-click on the device in iTunes you will see an option to force a local backup. Once you do that, you can access your backup with iBackupBot, a cool tool that knows how to access and interpret your device backups. I found Calendar.sqlitedb and extracted it to my Desktop.

The first thing I noticed is that the file was close to 73MB in size! That correlated to the amount of I/O that dataaccessd appeared to be performing according to the Network template in Instruments. If dataaccessd is having to rewrite that file regularly, no wonder it's eating the CPU (and my battery).

I now decided to get into the database itself and check it out. I started Terminal, changed directory to where Calendar.sqlitedb was and started up sqlite3 to inspect it. Running .tables looks like this:

$ sqlite3 Calendar.sqlitedb
sqlite> .tables
Alarm                      Location                 
AlarmChanges               Notification             
Attachment                 NotificationChanges      
AttachmentChanges          OccurrenceCache          
Calendar                   OccurrenceCacheDays      
CalendarChanges            Participant              
CalendarItem               ParticipantChanges       
CalendarItemChanges        Recurrence               
Category                   RecurrenceChanges        
CategoryLink               ResourceChange           
EventAction                Sharee                   
EventActionChanges         ShareeChanges            
ExceptionDate              Store                    
Identity                   _SqliteDatabaseProperties
sqlite> 

So, how do I figure out which table is the problem? I started by figuring out the sizes of each table:

sqlite> select count() from Alarm;
61
sqlite> 

I did this for every table in the database until I found the culprit:

sqlite> select count() from Participant;
390883
sqlite> 

Well that doesn't seem right! The Participant table was multiple orders of magnitude larger than any other table. Now, I started looking at the data in that table:

sqlite> select * from Participant limit 100;
722524|8|0|0|0|0|10|137500||2E58AAB4-170D-4118-B9E2-ACE8710B0AB6|mailto:redacted@gmail.com|0
722527|8|0|0|0|0|10|137517||ACE3112E-B096-41A7-8AA7-24C0A9F375D5|mailto:redacted@gmail.com|0
722530|8|0|0|0|0|10|137545||2D7F4627-FE6E-4674-9AAC-72A0E1471382|mailto:redacted@gmail.com|0
722533|8|0|0|0|0|10|137561||1580D42F-E479-4612-86A1-626BA44CAA0F|mailto:redacted@gmail.com|0
[…]
sqlite> 

There are a *bunch* of participants on events in my calendar for events that appear to have originated on Google (I guess either because I subscribed to a calendar there or received them in email and accepted them on to my calendar). We need to look at the schema a little bit to figure out what is going on:

sqlite> .schema Participant
CREATE TABLE Participant (ROWID INTEGER PRIMARY KEY AUTOINCREMENT, entity_type INTEGER, type INTEGER, status INTEGER, pending_status INTEGER, role INTEGER, identity_id INTEGER, owner_id INTEGER, external_rep BLOB, UUID TEXT, email TEXT, is_self INTEGER);
CREATE INDEX ParticipantEntityType on Participant(entity_type);
CREATE INDEX ParticipantOwnerId on Participant(owner_id);
CREATE INDEX ParticipantUUID on Participant(UUID);
sqlite> 

It appears that owner_id points to a row in CalendarItem that is the owner event for each Participant. So, we try to narrow things down to see what is going on:

sqlite> select * from Participant where owner_id=137500 limit 100;
722524|8|0|0|0|0|10|137500||2E58AAB4-170D-4118-B9E2-ACE8710B0AB6|mailto:redacted@gmail.com|0
722645|8|0|0|0|0|10|137500||9C49863B-11F5-41A1-A6A4-5093445E4809|mailto:redacted@gmail.com|0
722816|8|0|0|0|0|10|137500||B6701899-1582-4574-91F2-0F0B6E826768|mailto:redacted@gmail.com|0
722937|8|0|0|0|0|10|137500||A1C476C0-C3F8-43DD-B3A9-055956001862|mailto:redacted@gmail.com|0
[…]
sqlite> 

Why in the world does one CalendarItem have so many copies of the same Participant on it? And how many times exactly you might wonder?

sqlite> select count() from Participant where owner_id=137500;
9771
sqlite> 

Whoa! Clearly there was a bug at work here. There were small counts of Participant rows with emails without "mailto:" so I figured that must have been the root of the problem. My best guess is it was fixed somewhere by somebody at some point but no code was ever written to clean up the mess it left behind in my Calendar.sqlitedb.

So, now what do we do about it? Again, a wipe and reset of the phone would probably fix this. But I wasn't interested in wasting time on that. So, I decided to try an experiment. I again backed up my phone and then made a zip archive of the backup directory in ~/Library/Application Support/MobileSync/Backup just in case things went horribly wrong. I then used iBackupBot to again extract Calendar.sqlitedb and started sqlite3 on it. I then took a chance and tried to get rid of the junk participants.

sqlite> delete from Participant where email like "mailto:%";
[...crunching away for a few seconds...]
sqlite> 

I then exited sqlite3 and now the size of Calendar.sqlitedb was just 2.2 MB! This was looking promising. I imported Calendar.sqlitedb back into my backup using iBackupBot and restored my phone from this backup.

This is where things got a little scary. iTunes restored the phone and it started rebooting – and then powered off. I powered it on and it powered itself off within about 10 seconds. I powered it on again and the same thing. At this point, I'm thinking "oh well, going to have to do a full recovery restore back to where we started" and powered the phone on again, prepared to put it into Recovery Mode. But I gave it another chance and to my surprise it finished booting!

I got Instruments running again with the Activity Monitor template and unlocked the phone. I interacted with Calendar and watched the effect. Calendar is now nice and snappy and dataaccessd runs for just a couple of seconds and then goes idle.

It's still early after this adventure so I'm not 100% positive this fixed my battery drain yet but the early indications are promising. Calendar is nice and snappy and dataaccessd no longer eats the CPU. The battery life feels subjectively better so far; it will take a couple of days to really get a sense for the change. And I haven't fully exercised Calendar to see if I haven't borked it somewhere with my database hacking but so far so good.

If anybody at Apple ends up reading this, I've kept a copy of that backup with the original Calendar.sqlitedb in case somebody wants to perform forensics on it. Also, I recommend writing a Calendar.sqlitedb "fsck" type of utility and adding it under the covers to the OS update process in order to keep this cruft at bay. You just might see the battery complaints die down.

UPDATE: My battery life has dramatically improved. Before, I had to start recharging it about 2/3's of the way through the day. Now, I can go an entire day with moderate usage and still have about 1/3 power remaining.

Computer Security and Anti-Lock Brakes

You may well wonder what the two items in the title of this post have to do with each other. Computer Security is of course the practices and tools that go into having a secure computing experience while anti-lock brakes are a safety feature on most modern cars.

When anti-lock brakes were introduced, they were hailed as a life-saving technology that was sure to reduce the number of accidents on the road and result in less injuries and cost savings for everybody. However, the real-world results never matched these expectations. When drivers learned about and began using anti-lock braking systems, they started driving faster, following closer and braking later. All of these factors effectively cancelled out the predicted benefit of introducing them in the first place.

A number of studies have concluded that Risk Compensation is the reason for this result:"an effect whereby individual people may tend to adjust their behavior in response to perceived changes in risk".

So, what's really happening here? People go about their lives behaving in ways based on the perceived risks of their activity. If they think they might get hit by a car when crossing a street (the risks are higher) they will look both ways before crossing. If they think they might have less odds of getting into an accident because their car has anti-lock brakes (the risks are lower) then they will drive more aggressively.

The effect is even more pronounced in professional sports. The National Football League (NFL) is experiencing more significant injuries while at the same time deploying safer equipment and changing rules in the name of safety. Players are responding to the perceived decrease in risks by playing the game more aggressively.

Putting Safety Pads on Your Computer

I believe Computer Security for consumers also suffers from the Risk Compensation effect, especially when it comes to firewalls and anti-virus software.

Firewalls and anti-virus software are staples of your average consumer computing experience. Most consumers don't really understand what these tools are or how they work, but they are told that if they use them and keep them up to date, they will be safe. Consumers are rarely educated about the basics of computer security technology. It would be charitable to say this is an oversight of an industry that wants to provide a safe and turn-key experience to its consumers. The more cynical reader is probably already thinking the more likely explanation; the technology industry doesn't think users can or will ever be able to understand these issues.

The problem is that firewalls and anti-virus software are not nearly as effective as our industry has led consumers to believe. Combine this situation with Risk Compensation and we have an impending disaster on our hands. Consumers who are not educated on the basics of computer security are taking significant risks based on a false perception that firewall and anti-virus software will keep them safe.

Drivers Ed

The analogy with anti-lock brakes is a useful one in more ways than one. Clearly the automotive industry and our society is doing something right concerning automobiles or we would have an epidemic of automotive accidents. I think the key is two-fold: 1) a sense of responsibility and 2) education.

Unlike when your average consumer buys a computer, a new driver must go through drivers education and pass a written exam. Vehicles come with manuals that have all of the basic operational details spelled out including all safety procedures. At the same time, we have laws and regulations that hold a driver responsible for the operation of their vehicle.

The result is a driving experience that we as a society are relatively happy with.

Putting on the Brakes

In comparison, when a consumer buys a computer, they rush home to unpack it, watch a quick introductory video on how to attach it to the Internet, install anti-virus and firewall software and then start surfing. No computer security information is taught and no sense of responsibility is imparted for how the computer is operated.

I want to make sure readers don't think I'm suggesting that we require a computer security version of drivers ed and a license to operate a computer nor that we need to pass new laws making users responsible for the actions taken by their compromised computers.

What I am advocating is that we start educating users about computer security. If they can learn important and complicated information regarding the safe operation of a car, they can surely learn material presented at the same level about computer security. I'm also advocating that we stop pretending that anti-virus and firewall software are going to protect consumers from all of the ills on the Internet.

People need to understand what level of real protection these tools are providing – and what risks they are still exposed to – so that they can become a constructive and active participant in improving computer security for everybody.

Of Guns and Malware

I came across this video the other day:

It's a really entertaining TED Talk about the world of computer security from the perspective of malware and presented by Mikko Hypponen of F-Secure. I encourage you to watch.

He closes with the following:

I've spent my life defending the Net, and I do feel that if we don't fight online crime, we are running a risk of losing it all. We have to do this globally, and we have to do it right now. What we need is more global, international law enforcement work to find online criminal gangs -- these organized gangs that are making millions out of their attacks. That's much more important than running anti-viruses or running firewalls. What actually matters is actually finding the people behind these attacks, and even more importantly, we have to find the people who are about to become part of this online world of crime, but haven't yet done it. We have to find the people with the skills, but without the opportunities and give them the opportunities to use their skills for good.

In other words, anti-virus and firewalls aren't the solution to our problem. Stopping the people who create and produce malware is.

At the same time, we have this sentiment that bubbled up in the news recently:

Is antivirus software a waste of money?

As it turns out, many of his security-minded peers don't use [antivirus software] either. The reason: If someone is going to try and attack them, they're likely to use a new technique, one that most antivirus products will miss. "If you asked the average security expert whether they use antivirus or not," Grossman says "a significant proportion of them do not."

That's a pretty clear indictment of the status quo. What we are doing is not working.

Guns don't kill people, people kill people

What I believe is happening here is a growing realization of what I've talked about before. The current security situation is a never ending battle of measure and counter-measure with ever increasing casualties. What is needed is a dramatic change in the way we approach this battle.

Mikko points to one way to change this. Stop trying to stop the "guns" in this battle from being manufactured and distributed; instead go after the people who are using them to commit crimes.

However, the same Wired article from above goes on to cite another approach:

Patterson said his company, Patco, had “good AV” at the time of the attack, but nevertheless it missed the password-stealing Trojan. Now, two years later, he’s taken an inexpensive step that every small business should take to prevent his company from becoming victim to this type of fraud: He’s told his bank give him a call before it authorizes any big money transfers.

This to me is the real game changer. And I hope to make Trust Inn the catalyst for that change.