Between failure and success

in #failure6 years ago (edited)

Between failure and success

I was part of a large team of engineers who participated in the design and implementation of a large communications system placed in the aircraft and aircraft devices can send and receive information via satellite between the plane and control points and monitoring in the ground and provides telephone lines and the Internet to the captain and others. The system is operating successfully in thousands of aircraft today.
After several months of using the system in the passenger planes, we received a report on the failure of one of the aircraft belonging to a US company and puzzled by the nature of the failure report. A few days later we received another report about the failure of the system on the same plane. And reports have been reported about the same nature of failure in the same safari; Worse still, it was confirmed that failure occurs when the plane reaches the same area of ​​the globe.
We knew that the area was covered by three satellites and there was no reason why the Earth should not be connected naturally through any of these satellites. What causes this failure? We did not have a logical answer to that question. But that serious problem must be resolved.
The plane moves between a number of American cities and then moves to a European city and then to Tokyo in Japan, including to Beijing in China and returning on the same route. In the Beijing-Tokyo area, the system fails and loses contact with the ground. The information from the Fault Logs was not enough to know the cause of the problem and we were in dire need of more logs from that dreaded system.
Some technicians moved to the city where the plane stopped in America. They installed an Ethernet link with the device in a remote location among hundreds of aircraft devices, leaving the end of the wires hidden in the last seat of the plane and reserving that seat for each journey until it returned to the first point. At the arrival of the plane to Beijing, an engineer sneaked into the plane carrying a laptop and sat in that seat to find a link that would enable him to connect to the device and we were sure that he had trained on how to handle the device to be able to receive details of what is happening in that system. He was able to successfully carry out the mission and leave the plane in Tokyo and sent us several files containing the details of his work.
Half an hour after these files arrived, we knew why the device had failed. We have retained the value of time since the machine started operating in a four-byte 4 bytes variable. The accuracy of time was 0.1ms. Therefore, this variable depletes its total volume after about 60 hours and the so-called overflow occurs. All communication systems expect the customer's time to be reasonably accurate; the ground system will refuse to communicate with the aircraft systems through our system because the time is different from Earth time by about 60 hours.
The solution was very simple despite the seriousness of these results. We changed the type of time-varying variable to eight bytes instead of four. When the plane arrived in that American city, the modified program was waiting and the wires were removed. The engineers left the airport and we did not hear about that failure again.
Do not give up your search for success because of failure several times and if that failure is large and embarrassing

Prof : Sharif Babeker - Sudan

24 January 2018

Sort:  

Congratulations @watani! You received a personal award!

Happy Birthday! - You are on the Steem blockchain for 2 years!

You can view your badges on your Steem Board and compare to others on the Steem Ranking

Vote for @Steemitboard as a witness to get one more award and increased upvotes!

Coin Marketplace

STEEM 0.21
TRX 0.13
JST 0.030
BTC 68152.98
ETH 3536.22
USDT 1.00
SBD 2.86