Thank you for responding with your insight, Bob!
On 1/24/23 19:15, Bob Stricklin via pacsat-dev wrote:
First the Excel spread sheet I sent is a early look at currents needed. Since I put that together some of the parts have changed and some have been added. I am sure we have a power issue but taking the position of just trying to get everything we want done then we can back down on capability and reduce power later. There is not a limit or budget on power at this time.
Thank you. I meant to mention I considered the spreadsheet a first order approximation but I may have missed that in my revisions.
Each time you add one of these current monitors to the design you introduce another part that can fail due to latch-up and other reasons.
The action taken for each monitor added may be different. Latch-ups are possible from radiation exposure. These can be single event or they can result in a hard failure of a part. When there is an event and high current the plan may be to power down and wait for a period of time and then try to restart. If it is the processor with an issue then you are restarting everything if it is a sub circuit then you may be able to do a quick recycle. There are different types of current monitors to help you with your action plan. It may also be necessary to build a subcircuit to get the results needed.
We're not necessarily dealing with hard failure of a part with this current switch. We are specifically dealing with single-event upsets leading to latchup from a radiation effect that further results in unregulated power consumption. This result is considered transient and is resolved with a power cycle, hence the use of this part in Fox and now Golf. From our recent experience, hard failure of a part seems relatively rare and we haven't had a recent satellite with batteries that lasted long enough to deal with total ionizing dose, for example. (I don't know for sure which AMSAT satellites used non-hardened integrated circuits and thus would be resistant to that affect.)
<snip>
I worked on optical ICs and since these were exposed to light we had to be careful not create an issue with latch-up. When a new design comes out of wafer fab it is one of the early test you do to see if you have issues. If you find a problem you have try and fix it by changing the die layout, adding more metal or modify the circuit. When a device is “radiation harden” this should also be done and hopefully the TMS570 had this done. Still could fail with radiation though.
One thing to point out... I don't believe the TMS570 is radiation hardened. I understand it's used in safety critical equipment and has special circuitry to detect failure modes. But I wouldn't expect it to be immune to single-event upsets. In the case of bit flips that impact processing, the TMS570 could detect that as a failure when comparing the results of the two cores and assert a failure. In the case of the RT-IHU this would result in failover to the mirror processor. In the case of the PACSAT payload, which I believe is running a single TMS570, the failure line could be tied to the power circuit to reset. If the power circuity of the TMS570 suffers a single-event upset that latches up a power rail I'd expect we'll depend on the current switch to detect and recycle power to recover. (On a related topic, it's pretty fascinating to examine the Fox telemetry and observe the impact of the SAA. I don't know if Fox reset every time it traversed the SAA but it was quite impactful.)
As long as we're talking about radiation affects, nothing we're doing will mitigate total radiation affects that will ultimately degrade and cause failure of our chips.
Jonathan