The Daily Parker

Politics, Weather, Photography, and the Dog

Good question: where were the auditors?

Ho did the accounting firm CliftonLarsonAllen LLP miss that Dixon, Ill., comptroller Rita Crundwell embezzled $53 million?

CliftonLarson in 2005 resigned as auditor for Dixon in order to keep other city assignments such as ledger-keeping after an influx of federal funds required the town to hire an independent auditor.

In its lawsuit, however, Dixon contends that CliftonLarson continued to do the annual audit and get paid for it, while hiring a sole-practitioner CPA from nearby Sterling to sign off on the work, thereby preventing competitors from grabbing the business. CliftonLarson says it prepared only a bare-bones “compilation” of financials after 2005 to aid the new auditor, Samuel Card, 56, who also is a defendant.

In depositions in late 2012, Power Rogers attorney Devon Bruce produced CliftonLarson emails after 2005 that referred to the firm's “audit” of Dixon. Also entered into the record were invoices submitted by Ms. Crundwell supposedly from the Illinois Department of Transportation that lacked an IDOT heading or logo and, in one instance, carried a nonexistent date—Nov. 31, 2004.

“Had a two-minute phone call been made by a Clifton employee to the Illinois Department of Transportation regarding any of these false invoices, Rita Crundwell's theft would have been identified at that time,” the lawsuit argues.

Oops.

Under the hood of Weather Now

My my most recent post mentioned finishing the GetWeather component of Weather Now, my demo project that provides near-real-time aviation weather for most of the world. I thought some readers might be interested to know how it works.

The GetWeather component has three principal tasks:

In the Inner Drive Technology world, an Azure worker process uses an arbitrary collection of objects that implement the IWorkerTask interface. The interface defines Interval and LastRun properties and an Execute method, which is all the worker process needs to know. The tasks are responsible for their own lifespans, reentry prevention, etc. (That's another discussion.)

In order to decouple the data source (NOAA now, other sources in the future) from the application, I split the three tasks into two IWorkerTask classes:

  • The NoaaFileDownloadingWorkerTask opens an FTP connection to the NOAA public weather servers, retrieves the files it hasn't already retrieved, and stores the contents in Azure Blob Storage; and
  • The NoaaFileParsingWorkerTask pulls the files out of Azure Storage, parses them, and stores the results in an Azure SQL Database and Azure table storage.

I'm using Azure storage as an intermediary between the two sides of the process because my analysis led me to the conclusion that they're really independent of each other. Coupling of the two tasks in the current (2002) version of GetWeather causes all kinds of problems, not least that a failure in one task can stop the whole thing. If, as happens given the nature of the Internet, the FTP side has an unrecoverable problem, the application has to restart. In actual practice it simply kills itself and waits for the next time it runs, which can be a while because it's running on a Windows Server 2008 Scheduler job every 30 minutes.

The new architecture will allow the parser to run every minute or two, see if it has anything to do by looking at some metadata, and do its job if needed. I can change a system setting to stop it from running (for example, because I need to do some database maintenance), while letting the downloader continue to work separately.

On the other side, the downloader can run every 5 minutes, snatch the one or two files it needs from NOAA, and shut down cleanly without waiting for the parser. NOAA likes this because the connection is only open for a few seconds, instead of the 27 minutes it stays open right now. And if the NOAA server isn't available, so what? It's a clean shutdown and a clean start a few minutes later.

This design also allows me to do something else: manually upload files for parsing and storage. This helps with testing, migration, service interruptions—all things that the current architecture has made nearly impossible.

I'm not entirely done with the application (and while writing this I just thought of an improvement I'll need to make to prevent infinite retries), but it's close. And I'm really pleased with the application so far. Stay tuned; I can now set a tentative public launch date of March 31st.

Resolving the oldest case

Five years ago, on 6 January 2008, I opened a FogBugz case (#528) to "Create NOAA Downloader". The NOAA downloader goes out to the National Weather Service, retrieves raw weather data files, and stores the files and some metadata in Windows Azure storage. Marking this work item "resolved"

Well, I just finished it, and therefore I have finished all of the pieces of the GetWeather application. And with that, I've finished the last significant piece of the Weather Now 4.0 rewrite. Total time to rewrite GetWeather: 42 hours. Total time for the rewrite so far: 66 hours.

Now all I have to do is...let's see...create worker role tasks to run the various pieces of the application (getting the weather, parsing the weather, storing the weather, and cleaning up the database), upgrading the Web site to a full Cloud Services application, deploy it to Azure, and deploy its gazetteer. That should be about 5 hours more work. Then, after a couple of weeks of mostly-passive testing, I can finally turn off the Inner Drive Technology Worldwide Data Center.

How to build software

Via Fallows, a software designer explains how a simple feature isn't:

This isn’t off the shelf, but that’s OK — we’ll just build it, it’s not rocket science. And it’s a feature that’s nice, not one that’s essential. Lot’s of people won’t use these tabs.

So, what do we need to think about when adding a bar of tabs like this?

  • The whole point is to have a view state that summarizes what you’re looking at and how it’s presented. You want to switch between view states. So we need a new object that encapsulates the View State, methods for updating the view state when the view changes or you switch tabs, methods for allocating memory for the view state and cleaning up afterward.
  • You need a bar in which the tabs live. That bar needs to have something drawn on it, which means choosing a suitable gradient or texture.
  • The tab needs a suitable shape. That shape is tricky enough to draw that we define an auxiliary object to frame and draw it.
  • Whoops! It gets drawn upside down! Slap head, fix that.

...and on for another 16 steps. He concludes, among other things:

This is a hell of a lot of design and implementation for $0.99. But that’s increasingly what people expect to pay for software. OK: maybe $19.95 for something really terrific. But can you sell an extra 100 copies of the program because it’s got draggable tabs? If you can’t, don’t you have better things to do with your time?

He's developing for a commercial application that he sells, so he may not be figuring the costs of development the same way I do. Since clients pay us for software development, it's a reflex for me to figure development costs over time. I don't know how much the tab feature cost him to develop, but I do know that to date, migrating Weather Now to Azure (discussed often enough on this blog) would have cost a commercial client about $9,000 so far, with another $3,000 or so to go. And the Inner Drive Extensible Architecture? That's close to $150,000 of development time—if someone else were paying for it.

And all you wanted was a little tab on your word processor...

It was a sunny day

Why? Because it's too cold for clouds.

Actually, this is one of those correlation-causation issues: cold days like today (it's -15°C right now) are usually clear and sunny because both conditions result from a high-pressure system floating over the area. Still, it's pretty cold:

A February hasn’t opened this cold here in the 17 years since 1996. The combination of bitterly cold temperatures, hovering at daybreak Friday near or below zero [Fahrenheit] in many corners of the metro area, plus the biting west winds gusting as high as 48 km/h, are producing 15 to 25-below zero wind chills—readings as challenging as any Chicagoans have encountered this season.

The first reported -18°Cor lower wind chill occurred Thursday at 8 a.m. and the expectation is a 40 or more hour string of consecutive sub--18°C wind chills is likely to continue through midnight or a bit later Friday night in the rising temp regime predicted to take hold at that time.

Still, it's February, which means lengthening days, warmer temperatures, and pitchers & catchers. Yay!

Why haven't AA and US announced a merger?

The Cranky Flier wants to know:

Now the latest “news” of the day is that American CEO Tom Horton may end up being the Chairman of the combined entities. There is some good and some bad to this kind of thing. The good is pretty simple to explain. If Horton is willing to settle for a Tilton-esque agreement where he can just sit in a fancy office and collect a huge paycheck for a couple of years, then that finally removes the last real barrier to a merger – the fight being put up by management.

On the other hand, if he insists on a more active role, then it’s a bad idea. There are very few supporters of Horton outside management ranks. Wall Street has been quite clear that Horton’s plans to date are unacceptable. In particular, the plan to grow the hubs by 20 percent is suicidal. As one analyst, Dan McKenzie, puts it, the growth plan “would be toxic for industry pricing and ruinous for shareholders….” The views throughout the financial community appear to echo that sentiment. If Horton has any kind of influence in the merged entity, then the money folks will not be happy. And that hurts the chances of the deal going through.

I'm really hoping for an announcement soon.

Back to normal in Illinois

With former governor George Ryan's release from prison this morning, Illinois has finally returned to the situation of having fewer former governors in prison than out of it. In an especially nice touch, former governor Jim Thompson is Ryan's attorney.

I guess Dan Walker and Jim Edgar are both still alive, too, so the current count is: 1 incumbent, non-convicted governor; 2 former, non-convicted governors; 2 former, convicted governors; and 1 former governor still in jail. There's a nice symmetry there, yes?

And now, mid-April

Chicago's normal high temperature for April 17th is 16°C, which by strange coincidence is the new record high for January 29th:

The warm front associated with the strong low pressure system passed through the Chicago area between 2 and 3AM on it’s way north and at 6AM is oriented east-west along the Illinois-Wisconsin state line. South of the front south to southwest winds 24 to 45 km/h and temperatures in the upper 10s°C prevail – Wheeling actually reported 15.6°C at 6AM. North of the front through southern Wisconsin and farther north, winds were east to southeast and temperatures near freezing. Milwaukee at 6AM was 3°C.

Moreover:

The 18°C high projected for Chicago Tuesday easily replaces the day's previous 99-year record high of 15°C set in 1914 and is a reading just 1.1°C shy of the city's all-time January record high temp of 19°C set back on Jan 25, 1950. Only 5 of the 34 January 60s [Fahrenheit] on the books have made it to 18°C.

Temps in the 60s [Fahrenheit] in January are incredibly rare—a fact which can't be overstated! In fact, just 21 of 143 Januarys since records here began in 1871 have produced 60s.

The city's last 16°C January temperature took place 5 years ago when the mercury hit 18°C on Jan 7, 2008.

Ordinarily in the middle of winter in Chicago it would be customary at this point to say "It was last this warm in..." and throw out a date from last summer. But no, this is the new world of climate change, so I can say: "It was last this warm December 3rd."

Of course, it can't last. Here's the temperature forecast starting at noon today (click for full size):

January to April to January in three easy steps...

Azure table partition schemes

I'm sitting at my remote office working on a conundrum: how to balance human usability against good software design.

The problem is: how can I create an Azure table partitioning scheme that uses Azure efficiently and still allows the user (me) efficiently to troubleshoot problems with the feature in question. This is a direct consequence of the issues I worked on this morning.

The feature is the component of the Weather Now parsing system that stores raw weather data from NOAA temporarily. By "temporarily" I mean, until I delete it. Keeping the raw data will allow me to figure out why problems occur and will allow the application to apply new features to old data in future.

NOAA publishes "cycle files" about every 3-6 minutes. The cycle uses a predictable sequence of 750 file names that repeats about every 4 days. The files go from file000 to file750, then back to file000. Sometimes, however, NOAA restarts the sequence at 0, skips files, or just crashes entirely, so the feature has to handle the file names as random. That said, the files have definite publication times, and generally—to an extent that Weather Now can optimize itself based on the pattern—the files contain weather data gathered within a short time before NOAA publishes the files.

You can have practically unlimited Azure tables in a storage account; I would imagine the number is close to the Int32 maximum value of 2.1 billion. Each table can have billions of partition keys as well. Searching on a combination of Azure table name and partition key takes the same length of time no matter how many tables are in the storage account or how many partition keys each table has. Under the hood, Azure manages the indexing so efficiently that network latency will be the bigger problem in all but a few edge cases.

For Weather Now, my first thought was to create a new table for each month's NOAA files and partition the table by day. So, weather parsing process would put the metadata for a file downloaded right now in the table "noaa201301" and use the partition key "20130127". That would give me about 5,700 rows in each table and about 190 rows in each partition.

I'm reconsidering. Given it's taken 11 years to change the way that Weather Now retrieves and stores weather data, using that scheme would give me 132 tables and 4,017 partitions, each of them kind of small. Azure wouldn't care, but it would over time clutter up the application's storage account. (The account will be cluttered enough as it is, with the millions of individual weather reports tabled by station and partitioned by month.)

On reflection, then, I'm going to create a new table of metadata each year, and partition by month. An Azure table with 69,000 rows (the number of NOAA files produced each year) isn't appreciably less efficient than one with 69 rows or 69 million, as it turns out. It will still partition the data as efficiently as the partition key suggests. But cutting the partitions down 30-fold could make a big difference in efficiency.

I'm open to contrary evidence. In fact, I'd love to find some. But given the frequency of data reads (one every 5 minutes or so), and the thousands of tables already in the application's storage account, I think this is the best way to go.