Friday, March 13, 2009

Code and stuff

This is post 191 - a nice 3-digit prime.  Don't really have enough posts to stretch the checking necessary - 191^(1/2) == 13.82... That means I'll have to wait until post 289 until the next prime past 13 needs to be checked.  Anyway, I spent the day working with code - something that hasn't happened in a while.  Always seems like a shock, but there you go.  People still don't believe me when I say that more time is spent debugging and testing than writing code.  Or designing code.  Or whatever with code.  Some stuff to look at though - something applicable even outside the small domain of our business - a thread pool.  I didn't find the bug, I didn't come up with the fix, I just applied the fix to the correct branch and added some comments.

Frankly, I'd like to be able to add some more tests before going forward, but I definitely didn't have the time now.  I know this sounds like an excuse - we are approaching a release - but there has to be balance between what should be done and what I'm being paid to get done.  In this case, the situation called for a targeted fix, so I relied on existing unit tests to verify the change.  Ideally, I'd like to spend more time checking the unit tests to make sure that items weren't missed, but that wasn't practical.  Next week is a different story however...

What's worse is that this issue is related to a low-frequency, high-impact bug that our team hasn't isolated yet.  I think that next week I'll try looking for it by writing more unit tests.  I have a feeling that may be the most reliable reproduce scenario - one that I create out of thin air.

Discovered something else near the end of the day, related to something I've seen often - improper shutdown/exit of software leads to problems.  This is an issue in embedded development, where restarting a problem subsystem is a good uptime strategy.  The Qnx operating system uses this philosophy, where the microkernel never goes down, but the real work is done by processes that can be restarted as appropriate.  Since everything, including device drivers, are "processes", a device driver upgrade consists of putting the software in the right place and "restarting" the driver.  Being able to shutdown and restart a small piece of functionality is akin to modular programming.  Unfortunately this modular, restartable philosophy was not something hammered home at the beginning of this project, so this issue keeps recurring.  I also think that being able to stop and restart a module without side effects demonstrates good modularization qualities - particularly isolation.  It means that the writer and designer of the piece of software has considered its interactions and resource requirements and they are well identified within the code.  Losing sight of these things means that module can be shutdown, but not restarted.  

Well, I guess that is something I'm going to have to talk-up at work.

No comments: