Minutes of Weekly Meeting, 2008-03-03

Meeting was called to order at 8:23am EST

1. Roll Call (Participants):

Brad Van Treuren
Ian McIntosh
Carl Nielsen
Adam Ley
Heiko Ehrenberg

Proxy updates from Yingwu Li and typo corrections by Ian.

2. Review and approve 2/25/2008 minutes

minutes were approved (moved by Heiko, second by Carl)

3. Review old action items:

  • Adam proposed we cover the following at the next meeting:
    • Establish consensus on goals and constraints
    • What are we trying to achieve?
    • What restrictions are we faced with?
  • Establish whether TRST needs to be addressed as requirements in the ATCA specification if it is not going to be managed globally (All)
  • Register on new SJTAG web site (http://www.sjtag.org) (All)
  • All need to check and add any missing Doc's to the site (All)
  • Respond to Brad and Ian with suggestions for static web site structure (Brad suggests we model the site after an existing IEEE web site to ease migration of tooling later) (All)
  • Look at proposed scope and purpose from ITC 2006 presentation (attached slides) and propose scope and purpose for ATCA activity group (All)
  • Look at use cases and capture alternatives used to perform similar functions to better capture value add for SJTAG (All)
  • Volunteers needed for Use Case Forum ownership (All)
  • Continue Fault Injection/Insertion discussion on SJTAG Forum page (All)
  • Continue Structural Test use case discussion on SJTAG Forum page (All)
  • We will need to begin writing a white paper for the System JTAG use cases to provide to the ATCA working group (All)
    Most likely, champions will own their subject section and draft the section with help from others. This paper will be based on the paper Gunnar Carlsson started in 2005.
  • All: review how to use the forum
  • Locate ATCA glossary of board and system states (Adam, Brad)
  • Ian and Brad work on setting up a Glossary Page on the SJTAG site. (Done - http://www.sjtag.org/glossary.html)
  • Continue POST use case discussion on SJTAG Forum page (All)
  • Brad submit an abstract regarding SJTAG Use Cases for ITC. (Done)

4. Discussion Topics

  1. SJTAG Value Proposition - Programming and Updates
    • [Brad] comment from Huawei's Yingwu Li: volume too large for programming FPGA/CPLDs, but using JTAG for programming CPU and boot prom is very valuable, then CPU can download FPGAs and load the rest of the data;
    • [Ian] We are not as constrained to time as others. In manufacturing we typically program boards and not systems. In the field we would program systems. We have had to disassemble part of the system to get access to the system bus for programming.
    • [Ian] One mission bus used for ground loading for mission data (way point navigation information [Virtual homing beacons generated on your flight path by the computer based on GPS or navigational signals - Brad]). Programming lower level items is up to us what to use. It may be RS-485 to a matrix controller or an SJTAG interface. The problem is there are so many different ways that each designer comes up with their own solutions to the problem.
    • [Ian] we program boards and then assemble a system with programmed boards; a system then may need to get updated; use either mission bus for updates or JTAG access; don't dismiss JTAG entirely for programming/updates just because it is not efficient for programming FLASH;
    • [Brad] Explained a challenge he is facing of a system needing to update 4 large FPGAs where the board has no room for configuration PROMS, but there is a boundary-scan Test Controller built into the administrative processor mounted on a mezzanine to this board.
    • [Brad] size constraints may not provide room for configuration PROM's, yet FPGA's need to be programmed upon power-up; where to store the data? on the shelf controller?
    • [Ian] I would be concerned what the start-up time would be
    • [Brad] absolutely; it'll probably be over a minute for the new Virtex 5 and Stratix II FPGAs if more than one were on the board;
    • [Brad] in this example, JTAG may not only be used for updates / upgrades, but even for normal system initialization;
    • [Ian] dangerous; if shelf controller goes down, nothing in the system will boot/work; no graceful degrade of the system but total collapse
    • [Brad] redundant systems may help, yet there is still a chance (albeit very small) that all redundant instances fail;
    • [Ian] there are things that can happen, I'd be very nervous;
    • [Long pause ...]
    • [Brad] getting update/upgrade data to the system is an issue we discussed a little bit last week; for telecom systems data can be transferred remotely, but for other applications download by an on-site technician may be required (as commented by Ian in last week's meeting minutes)
    • [Ian] at this time, field service engineers need to be at the system to upload new data; one can imagine better and remote ways in the future;
    • [Ian] We would have to update in a hangar. We only have the opportunity to open a panel on the aircraft and connect to the box.
    • [Ian] upload would be with closed covers (system would not be opened, upload through connector)
    • [Brad] as indicated by Adam last week, there is a difference between updating a board/system and restoring a board/system;
    • [Ian] Restore will require some sort of diagnostics before you can perform a restore. Similar to what needs to be done to change a fuse. You probably need to take the system out of service. It is really like doing a repair.
    • [Ian] you may even have to take the covers off, maybe swap boards, maybe even troubleshoot the root cause before programming;
    • [Brad] Last week Heiko stated: Should these operations be done at repair and not perform these in the field?
    • [Brad] We can actually take a board out of service in most cases to perform testing on it.
    • [Heiko] when you get to that point, we don't really have to worry about the minute or two that it takes to program the FPGA data, but rather the time it takes to troubleshoot the system;
    • [Ian] swapping a module is the most common action in such a case in our industry; bad module is then shipped to the repair facility; we usually deliver a set of spares modules with the system these incidents;
    • [Brad] it really depends on the application whether you do a replacement of modules or a restoration in the field;
    • [Ian] space products are an example for systems you cannot get back for repairs or updates; telemetry needed for any updates;
    • [Brad] this gets us back to the need for system redundancy (needed to keep service available while one instance of the system is updated)
    • [Brad] It might be less expensive, as far as customer down time goes, to work on a remote solution, possibly remote update, while dispatching a technician to replace the FRU if the customer could be brought back into service earlier. Lost revenue due to an outage situation could get quite costly. The bottom line is the application needs to dictate the replacement or restoration strategy.
    • [Brad] we didn't talk about the specific methods (strategies, tooling, architecture, etc.) of programming and updates yet; focus for next week's conference call;
      Heiko and Adam agree!
    • [Brad] Be thinking about 1) how updates are performed, 2) the tooling required to support updates, and 3) the strategies used to perform the updates in a system.
    • [Yingwu's comments]
      • Thank you Brad. Thank you all.
        I am very glad to join the group and share your views.
      • Same as Ian stated, in manufacturing we program boards and not systems also. And I agree using SJTAG to program is dangerous because it like as fault injection. Maybe it need a redundant systems but we must take it all round because there are some unexpected effect when the device transfer to the EXTEST state. The system maybe collapse just because one accidental signal wire among the boards.
      • But the remote upgrade is still very significative in the field to us.
      • For example a base station has placed out of the way on the mountaintop, it is very costly to dispatch a maintenance man to the field. Of course, there are many methods to finish remote upgrade. In most cases, we don't implement the remote upgrade through SJTAG. But the SJTAG bus is significative as a safe access. If the board upgrade fail and that can't resume, using the SJTAG for resuming the board is very valuable in the case.
      • Our work maybe to ensure the program is efficient and safe. For example it is probably efficient that store the data in the board which need upgraded.
      • Because English is not my mother tongue and I am a dawdler to learn language, my expression maybe is unclear. I am glad to listen respectfully your advice.

5. Schedule next meeting

Monday, March 10, 2008, 8:15am EDT

6. Any other business

  • Adam: "how does ATCA define the states of FRU's" - chapter/clause 3 of the standard document contains a state table and a state diagram; I'll share that information with the group;
  • Brad: Service Availability Forum (http://www.saforum.org) may be an even better source for that kind of information; I'll look into that forum; let's follow both routes;

7. Review new action items

  • Adam review ATCA standard document for FRU's states
  • Brad to review Service Availability Forum

8. Adjourned at 9:23am EST

moved by Heiko, second by Ian

Thanks goes to Heiko for assisting in the minutes!

Respectfully submitted,
Brad