In 1971, the USSR delivered the first planetary rovers on skis to Mars, whose task was to puncture the surface with a rod (housing a dynamic penetrometer and a radiation densitometer) to see if Mars was solid or liquid dusty. The first probe crashed on November 27; the second soft-landed on December 2 but didn't manage to get out of the "shell" of the lander, so that attempt didn't count.
This article was originally published in Russian on habrahabr.ru. The original and translated versions are posted on our website with the permission of the author.
25 years later
On July 4, 1997, the U.S. probe arrived at Mars and brought a "sojourner" with the first bug.
Image from sci-fi film "The Martian". The main character is carrying the Sojourner rover
The mission was at risk, but the powerful debugging functionality provided by the operating system, and professionalism of the programmers back on Earth (the guys did know their subject) enabled NASA to fix the bug in a short time.
The mission's cost was relatively small — $265 million.
The rover operated for 83 sols.
The rover's name, "Sojourner", originates from the Bible, where it means "traveler", and was selected in an essay contest won by V. Ambroise, a 12-year-old from U.S. state of Connecticut. It is named for abolitionist and women's rights activist Sojourner Truth.
Priority inversion occurs when two or more threads with different priorities start competing for CPU resources.
The lander was carrying a radiation-hardened IBM Risc 6000 Single Chip (Rad6000 SC) 20 MIPS CPU with 128 Mbytes of RAM and 6 Mbytes of EEPROM. The operating system used was VxWorks.
The rover employed a 0.1 MIPS Intel 80C85 CPU with 512 Kbytes of RAM and 176 Kbyte of flash memory solid-state storage.
Three tasks with different priorities waiting around on the 1553 bus.
When collecting meteorological data, the system hung and started to reset repeatedly. The engineers on Earth ran a duplicate of the software and got down to work figuring out what was wrong. After 18 hours of studying detailed logs, they found the cause of the malfunction.
They only had to fix a couple of mutex flags.
No, we did not use the vxWorks shell to change the software (although the shell is usable on the spacecraft). The process of "patching" the software on the spacecraft is a specialized process. It involves sending the differences between what you have onboard and what you want (and have on Earth) to the spacecraft. Custom software on the spacecraft (with a whole bunch of validation) modifies the onboard copy. If you want more info you can send me email.
— Glenn Reeves, team leader of Mars Pathfinder software developer team
Those interested in details were invited to email the software author at email@example.com.
How the patch was uploaded?
VxWorks contained a C language interpreter to execute statements on the fly during debugging. The JPL engineers decided to launch the spacecraft with this feature still enabled. A short C program was uploaded to the spacecraft, which when interpreted, changed the values of the mutex flag for priority inheritance from false to true. No more system reset occurred!
Glenn Reeves, the engineer who found and fixed the bug, with a Mars Pathfinder duplicate in the background
The bug was found in preflight testing on Earth but was given a low priority.
Glenn Reeves is very thankful to the engineers at Wind River for developing an operating system that enabled remote debugging even in emergency conditions like those that occurred during the mission. Interestingly, the bug was known to the engineer team, but there are "deadlines" and "priorities" that force mission leaders to launch spacecraft, being aware of unfixed "weak spots".