The Daily Parker

Politics, Weather, Photography, and the Dog

Minor delays on the El this morning

I'm now at Heathrow where I've got a really great perch overlooking the approach end of runway 9L. A JAL 777 has just floated down to the runway and a BA 747 is taxiing past the window. It's a little piece of aviation heaven in Terminal 5 as I wait for the 787 to Toronto.

As I mentioned earlier, however, my trip home tomorrow morning may end a little differently than usual because of this:

(Photo credit.)

Fortunately, no one was hurt. Unfortunately, the El still missed its flight. Never try to carry too much baggage up the stairs; use the elevator instead.

Boarding starts in a few minutes. Time to boogie. But I'll wait for this BA 777 to land. They're really amazingly graceful when they touch down.

Can't wait to get home

Just checking the local news in Chicago a moment ago I see a weather forecast of -2°C and blowing snow for Tuesday, rain for the rest of the week, and a crash at the O'Hare subway station:

Thirty people were injured after a CTA Blue Line train derailed and hit a platform at O'Hare International Airport about 2:55 a.m. Monday.

The injuries are not life threatening, according to early reports from the scene to Chicago Police Department headquarters, Chicago Police Department News Affairs Officer Ron Gaines said.

It's not clear how fast the train was moving but it jumped a bumper at the end of the line and moved up an escalator, according to Chicago Fire Department Spokesman Larry Langford.

The CTA posted to its Twitter page that trains were stopped at O'Hare but running between the Logan Square and Rosemont stops.

Yeah, I'm in a hurry to get back.

Week ending in London

It's 11pm on Sunday and everything is closed, so I'm taking a break from my break. My body still seems to think it's on Chicago time, which will help me rejoin American civilization on Tuesday, though at the moment it means my body thinks it's 6pm and wonders what it will do for the next three and a half hours or so.

I have accomplished what I set out to do this weekend. I visited the British Museum, the Southampton Arms, and another pub a friend recommended, The Phoenix. I've also finished Clean Coder, read Snow Crash cover to cover, and have gotten mostly through High Fidelity. The last book in the list connects Chicago and London—specifically, Camden and Gospel Oak, two neighborhoods I spent time in this weekend—more completely than any other book I can think of.

Tomorrow evening (morning? it's hard to tell) I'm flying out on a 787, about which I will certainly have something to write. I'm quite jazzed about it.

Now, back to Nick Hornby...

Are more megapixels inherently good?

I debated this question with someone at a dinner a couple weeks ago. She suggested higher megapixel numbers told you more about the ego of the camera buyer than about the quality of the images.

I said it depends on how you're using the photos, but generally, more data yields more useful photos.

Here's an illustration, using a vaguely-recognizable landmark that I happened to pass earlier this weekend, and just happened to have photographed with three different cameras. All three photos are from approximately the same location at approximately the same time of day. Obviously there are some differences, but the illustration should work regardless.

Let's take a look at three images stored as 600x900 JPEGs and displayed at 500x750, the standard size for this blog. First, let's see one from a Kodak DC4800 in February 2001, 13 years ago. The original size was 1440x2160 at 3MP:

Now skip forward to August 2009, using a Canon 20D shooting a 2336x3648 JPEG at 8 MP:

Finally, two days ago, using a Canon 7D shooting raw at 3456x5184 (18 MP):

The photos look pretty comparable at this resolution, don't they? So let's zoom in on a 150x150 pixel view of each:

So each one has successively more data than the previous, which becomes obvious when you zoom in.

Another difference: I shot the one from this weekend using the raw format, which preserves all of the information the camera had available at the time of the photo. JPEG images are lossy; they always leave some information out. And because raw images are easier to manipulate using software, I was able to make the third photo a little bit better than I could make the other two.

So are more megapixels more useful? Not if you're just putting up blog posts, but for serious photography, absolutely.

What the equinox means to a dozen scientists

The equinox, when the sun appears directly over the equator so that night and day is approximately equal all over the planet, happened Thursday. Today comes the consequence to the earth continuing in its orbit as the south pole appears to point farther away from the sun, as it's done since the last solstice.

In just about two hours, at 17:09 UTC, the sun sets on the Amundsen-Scott South Pole Station, and doesn't rise again until around 01:18 UTC on September 20th.

Good luck to the over-winter crew at Amundsen-Scott. Enjoy the long night.

Weakest link in the chain

I had planned to post some photos tonight showing the evolution of digital cameras, using a local landmark, but there's a snag. The CF card reader I brought along isn't showing up on my computer, even though the computer acknowledges that something is attached through a USB port.

As I'm visiting one of the most sophisticated and technological cities in the world, I have no doubt I can fix this tomorrow. Still, it's always irritating when technology that worked a few days ago simply stops working.

For those doubting my troubleshooting skills, I have confirmed that the CF card has all the photos I shot today; that the computer can see the CF card reader; and that the computer can connect effectively to other USB attachments. The problem is therefore either in the OS or in the card reader, and I'm inclined to suspect the card reader.

It's not the good times they care about, it's the bad

The repercussions from Monday's data-recovery debacle continued through yesterday.

By the time business started Tuesday morning, I had restored the client's application and database to the state it had at the moment of the upgrade, and I'd entered most of their appointments, including all of them through tomorrow (Thursday). When the client started their day, everything seemed to be all right, except for one thing I also didn't know about their business: some of their customers pay them based on the appointment ID, which is nothing more than a SQL IDENTITY column in the database.

If you know how databases work, you know that IDENTITY columns are officially non-deterministic. In this specific case, the column increments by one every time it adds a row, but also in this specific case, I didn't re-enter the data in the same order it was originally entered, since I prioritized the earlier appointments.

We've gotten through the problem now, and the client no longer want to put my head on a spike, so I will now take a moment for an after-action review that might help other software developers in the future.

First, the things I did right:

  • When I deployed the upgrade Saturday, I preserved the state of the database and application at exactly that moment.
  • All of the data in the system, every field of it, was audited. It was trivially easy to produce a report of every change made to the system from roll-out Saturday afternoon through roll-back Monday night.
  • When I rolled back the upgrade Monday night, I preserved the state of the upgraded database and application at exactly that moment.
  • When the client first noticed the problem, I dropped everything else and worked out a plan with them. The plan centered around getting their business back up first, and then dealing with the technology.
  • Their customers were completely back to normal at the start of business Tuesday.
  • The application runs on Windows Azure, which made preserving the old application state not only easy, but possible.

So what should I have done better?

  • My biggest error was overconfidence in my ability to roll back the upgrade. No matter what other errors I made, this was the root of all of them.
  • The second major error was not testing the UI on Internet Explorer 8. Mitigating this was the fact that neither I nor my client was aware that the bulk of their customers used IE8. However, given that people using IE8 were totally unable to use the application, even if the numbers of customers using IE8 was very small, the large impact should have put IE8 near the top of my regression test checklist.
  • Instead of spending a couple of hours re-entering data, I should have written a script to do it.
  • I have always regretted (though never more than today) publicizing the appointments IDENTITY column to the end user, because it's normal they'd use this ID for business purposes. This illustrates the danger—not just the sloppy design—of using a single database field for two purposes. Any future version of the application will have an OrderID field that is not a database plumbing field.

All in all, the good things outweighed the bad, and I may get back in my client's good graces when I roll out the next update. You know, the one that works on IE8, but still solves the looming problem of the platform's age.

And the day started so well...

At 8:16 this morning, a long-time client sent me an email saying that one of his customers couldn't was getting a strange bug in their scheduling application. They could see everything except for the tabbed UI control they needed to use. In other words, there was a hole in the screen where the data entry should have been.

Here's how the rest of the day went around this issue. It's the kind of thing that makes me proud to be an engineer, in the same way the guys who built Galloping Gertie were proud.

It all started when I updated a Windows Azure cloud service from the no-longer-supported SDK 1.7 running on Windows Server 2008 to the current SDK (2.2) and operating system (Windows Server 2012 R2). I also upgraded the language from C# 4.0 to C# 4.5.1, which is only possible on WS2012R2.

This upgrade started months ago, and proceeded slowly because both I and the clients had other priorities. I mean, who wants to spend a lot of money upgrading a platform without upgrading the application running on it? So the last build of the application went to production in October, and I haven't touched it since. I mean, it worked fine, why mess with it? Other than the fact that the operating system and Azure SDK are no longer supported.

Before pushing the update, I thoroughly tested the application. I mean, unit tests up the ying, with a tens-of-steps-long regression test on my local, and on an Azure test instance, before even looking askance at the Production instance. When I had tested everything I could imagine, I did this:

  1. Stopped the application, to ensure no one changed any data during the upgrade.
  2. Made a full copy of the production database ("CREATE DATABASE productioncopy AS COPY OF production")
  3. Once the data was fully copied, I uploaded the new bits to the Staging slot of the application.
  4. I updated the configuration info to the current standards.
  5. VIP swap! (I swapped the staging and production instances, so the old production instance was now in the staging slot.)
  6. And....it's running just fine. All that planning and testing worked!

So what happened? Well, it turns out there's one thing I didn't anticipate: Internet Explorer 8, released five years ago Thursday, and known to have difficulties with JavaScript. Plus, the controls we used when we orignally deployed in January 2008, made by Infragistics, have known incompatibilities with IE8, but again: the application has worked just fine the whole time.

Since everything worked just fine on earlier versions of the application, and since this update didn't directly change the UI, and since IE8 hasn't been supported in quite some time, I figured there wouldn't be any problems.

It turns out that a sizable portion of my client's customers use IE8, because they're big hospitals with big IT departments and little budgets for updates.

Once I realized with abject horror that the application was simply broken for most of the people using it, I resigned myself to rolling back to the previous release, which had worked just fine. When I got home, I started this task, and the following things happened:

  1. Once again, I stopped the application.
  2. The actual database restore went fine, as did the VIP swap putting the previous version back in the Production slot and the new version in the Staging slot.
  3. When the application started up, I realized I'd forgotten to roll back the configuration information for the logging and messaging component. So the application failed to start.
  4. I rolled back the config.
  5. The application again failed to start. Only now, because the logging and messaging component is the part that's failing, I can't see any diagnostics.
  6. Fortunately, I deployed the application with Remote Desktop enabled, so I tried connecting to the virtual machine directly.
  7. The Remote Desktop user account had expired.
  8. Fortunately I use great source control. In Mercurial, I updated to the last production build before the update, and loaded it into Visual Studio.
  9. Tried to load into Visual Studio, and failed. See, I no longer have the Azure SDK v1.7. I never installed it on this machine, in fact. I'm running SDK 2.2, and I have no easy way of running an older version.

So, as far as I knew at this point, there is simply no way to get into the application, and no way for me to re-upload the old version.

I decided to try a different tack. I rolled back the rollback and restarted the new version. I also started trying to get my last remaining Windows XP machine running so that I could confirm the bug, and start testing fixes on a Test instance running Windows Server 2012 R2.

Getting a 10-year-old laptop to boot, let me log in, stop wasting time with all the detritus it acquired in its years of service, connect to my network, and open up IE8, took 45 minutes.

Some time in there I walked Parker.

So now, I can see that the error exists in IE8, and I also have found an article on how to reset the RDP password expiration date. Only, I'm really tired, and I am worried I'll make stupid errors if I keep trying to debug this right now.

So I have two approaches I will try first thing in the morning: first, roll back to the October release, and manually update the RDP expiration date so I can remote in and debug the configuration problem. Then I'll have to re-create all the data my client added yesterday, which will take me at least an hour. If I'm supremely lucky I'll have this done by 8am. Since I've had no luck at all so far on this upgrade, I am not optimistic.

Second, I'll start removing the outdated Infragistics code. Believe it or not, jQuery works fine on IE8, despite it being pretty much the latest thing in user interface languages. It's the custom crap Infragistics pushed out years ago that fails. Unfortunately I won't be able to deploy this before leaving on Thursday morning. Fortunately the application isn't going to stop working suddenly; the OS and SDK are no longer supported, but they won't actually turn the OS off until June.

And there's the irony in a nutshell. I thought I did everything right in the deployment cycle, especially the part where I got three months ahead of the due date. The things that went wrong to get me to this state of frustration and exhaustion were numerous and tiny, kind of like the things that go wrong to cause an aviation accident. That said, the client has suffered no data loss, and I preserved a whole catalog of options to fix the problem (relatively) quickly. This isn't the disaster it would have been without the deployment tools you get with Azure.

Plus, I've learned to test everything on IE8 whenever health care companies are involved. Sheesh.