Minutes of Weekly Meeting, 2009-11-16

Meeting called to order at 10:36 AM EST

1. Roll Call

Brad Van Treuren
Eric Cormack
Ian McIntosh
Carl Walker
Michele Portolan
Adam Ley
Tim Pender

Excused:
Patrick Au

2. Review and approve previous minutes:

11/09/2009 minutes:

  • Updated draft circulated on 12th November:
  • No corrections noted.
  • Insufficient attendees present to approve these minutes.

3. Review old action items

  • Adam proposed we cover the following at the next meeting:
    • Establish consensus on goals and constraints
    • What are we trying to achieve?
    • What restrictions are we faced with?
  • Establish whether TRST needs to be addressed as requirements in the ATCA specification if it is not going to be managed globally (All)
  • Adam review ATCA standard document for FRU's states
  • All to consider what data items are missing from Data Elements diagram
  • All: do we feel SJTAG is requiring a new test language to obtain the information needed for diagnostics or is STAPL/SVF sufficient? see also Gunnar's presentation, in particular the new information he'd be looking for in a test language
    (http://files.sjtag.org/Ericsson-Nov2006/STAPL-Ideas.pdf)
  • Ian/Brad: Draft "straw man" Volume 4 for review - Ongoing
  • All: Review "Role of Languages" in White Paper Volume 4 - Ongoing
  • All: Review 'straw man' virtual systems and notes on forums:
    http://forums.sjtag.org/viewtopic.php?f=29&t=109. - Ongoing
  • Ian: Check for mailer problems and resend survey invitations. - COMPLETE
  • [Ian] Once I checked the mailer, I found that I had set the message as having a RSS newsfeed but with was no RSS feed associated. As a result, since there was no news, the mailer decided it didn't need to send the message to anyone. Once I removed the RSS flag, it worked OK.

4. Discussion Topics

  1. White Paper Review - Review of Virtual Systems
    • [Ian] Brad added some proxy comments to last week's notes. In one he asked if we could discuss a bit more of Adam's remark on defining requirements.
    • [Brad] I was really trying to get some scope on the requirements space - it can mean many different things.
    • [Adam] As I recall, I was asked to comment on the breakdown of the application model. I didn't say anything about the model itself, but was asked to comment on where boundaries are drawn, whether these map onto real world applications. I think we need to stay focussed on the requirements: If we see where we need to allow one application to interface with another, then that's useful. I'm saying you shouldn't put artificial boundaries in place before the applications are defined.
    • [Brad] I agree, decomposing applications into solutions is the wrong way for us to go. What you say is warranted and makes sense - identify the need for the boundary.
    • [Adam] I think we're very much on the same page. It's very early to be mapping these virtual systems onto real applications.
    • [Ian] Then Peter commented that embedded solutions may be stand-alone circuitry in a corner of the board or may be hosted in some existing asset of the design.
    • [Brad] For POST, we've done that in hardware, similar to how Firecron and Intellitech have shown. It is a very dedicated set of hardware for that.
    • [Ian] And that's pretty much the way we've elected to go. It may not be the most logical of reasons, but a major factor was the desire to avoid any need for 'software' which would be implied by using a processor: That demarcation opens up a whole load of issues, for organizational and process reasons.
    • [Brad] It's interesting you mention that, because one of our product groups has delegated the BScan diagnostics software back the the firmware people; it's software but written by the hardware guys.
    • [Ian] I suspect that's where we'll eventually get to, too. We have similar issues on using soft processor cores in FPGAs.
    • [Brad] One other thing, while we have a level of POST we also have firmware self-test, but that's not on-demand test. Do everything you can in boot, then there are things you can test running concurrently with the software loading, etc.
    • [Ian] That's very similar to the PBIT-1 and PBIT-2 phases we have on some products.
    • [Brad] Then there are the Green initiatives, where we see a lot more blocks powering up into low power states, that presents other issues over what can be tested.
    • [Ian] From a different angle, we're seeing similar requirements: To reduce the load on an aircraft's auxiliary power unit, we may have to start in a low power mode, but are still expected to report 'readiness'. It's hard to confirm there are no faults if you haven't fully powered everything up.
    • [Brad] There's also the thermal factor. We're trying to cram more equipment into the confines of old facilities.
    • [Carl] We have the same issue.
    • [Brad] You have the same issues as we have; your systems may be distributed amongst a number of boxes and you have to test the interconnects between the boxes.
    • [Ian] Often that's the hard part. They're not covered by the board JTAG, so often it means in-process monitoring of the data. If you don't see data on one link for a while you might start suspecting it has gone faulty.
    • [Brad] We have piggy backed some BScan onto links like that. There was paper on a distributed base station[1] where one part was remote by maybe 20km. It's an extreme case of a distributed system.
    • [Ian] Brad, can you provide a citation for that, so I can record it correctly?
    • [Brad] OK, I'll dig it out. {ACTION}
    • [Brad] Can we map this back onto the virtual systems? How do we show distribution? Do we need or want to? Are there other places we can have distribution?
    • [Ian] I think what you're suggesting is showing the hierarchical delegation of the test control. How we look at this is that each board can run it's own suite of tests. The LRU's BIT system only needs to know how to run the board tests and collect the results via some API; it doesn't need to know what the tests are. That scales up to a system comprising several LRUs. But I'm not sure if trying to show that wouldn't just add confusion.
    • [Brad] I was reminded of Gunnar's paper[2] where the BScan Test Manager resides on the FRU and has interfaces to a higher level that can apply tests on demand.
    • [Brad] This can get you into the philosophical debate of whether you have a 'push model' or a 'pull model'
    • [Brad] Some of our products have self-test, but also have multidrop capability.
    • [Ian] I think that's essential: All of these embedded solutions have some dependency on a degree of functionality being present, so you need a 'back door' if the unit is apparently dead. It's one of the other things with distributed systems. The fault reporting has to be communicated over a mission bus. If that fails, you might not be able tell if it's the link or the remote unit that's gone down.
    • [Brad] Is this requiring that the link needs to be of some highly reliable type?
    • [Ian] Possibly, but maybe we need supplementary signals outside of JTAG. The sort of thing we'll do is provide some critical discretes to show that power is OK or that the main controller is alive. That can help you figure out what is wrong if the BIT tests return nothing. We've even had a big debate about whether or not the boxes should have 'power on' LEDs. I can see an argument based on electrical noise but not on cost.
    • [Brad] In the telecomm industry, certain LEDs are required to be on every board and the colors are specified for particular indications.
    • [Brad] There is an argument that if you know a board has a fault then you'll need to exchange it anyway, so is there any point in conducting further tests? But can you reproduce the fault? Will you be able to determine the root cause? Do you need to take a snapshot? Was it a thermal overload or a software glitch?
    • [Ian] In that kind of vein, I've been looking at Single Event Upsets. These are more likely to occur at high altitude than near sea level. Since they affect SRAM and SRAM based FPGAs the effects can be either momentary or more persistent. With some FPGAs you can detect an SEU by the configuration SRAM CRC changing; other cases it's difficult to tell it from a hard fault. It just goes away after the power is cycled.
    • [1] 'Testing and remote field update of distributed base stations in a wireless network', Chen-Huan Chiang Wheatley, P.J. Ho, K.Y. Cheung, K.L. - Lucent Technol., Bell Labs., Holmdel, NJ, USA; ITC 2004 Proceedings.
    • [2] 'Remote boundary-scan system test control for the ATCA standard', Backstrom, D. Carlsson, G. Larsson, E. - Embedded Syst. Lab., Linkopings Universitet, Linkoping; ITC 2005 Proceedings.
  2. 2009 Survey
    • [Ian] So far, we've had maybe a half-dozen responses, but a few have generated additional referrals. As people complete the survey, I'll delete them from the mailing list. I'll let the survey run for a couple of weeks then send a reminder to the people still on the list; that worked quite well last year.
    • [Brad] I'd like to know, are people trying to answer the whole questionnaire or just sections?
    • [Ian] People are pretty much filling out the whole thing. There are some bits getting missed out, but not a lot.
    • [Ian] I can post a link to the results page, but we have to treat the data as 'privileged'.
    • [Brad] Yeah, when we did the 2006 survey, Ben tried to keep the detail data to just the officers, and then give a summary, they way you did last year, Ian. I'm just a bit cautious about privacy here.
    • [Ian] That's a good point. What I can do is create a version of the results page that removes names, companies and email addresses. {ACTION}
    • [Brad] That would be good.

5. Schedule next meeting

Schedule for November 2009: Monday November 23, 2009, 10:30 AM EST Monday November 30, 2009, 10:30 AM EST
  • Tim won't make 23rd

6. Any other business

  • [Ian] I guess we should welcome Michele to the group.
     
  • [Brad] I'd like to ask if anyone has a suggestion on a better way to move forwards with Volume 3. I have the feeling we're going a bit stale now.
  • [Ian] We have a lot of good content there; we maybe need to start by reorganizing it a bit.
  • [Ian] I have a thought that maybe we could start trying to set out headings like we did with Volume 2, but I suspect it may not be as simple here: I expect things will grow as we start to uncover more. That's probably not a bad thing.
  • [Ian] There are some things we've learned over the past few weeks that we need to find a way to fit into the document: That JTAG is a 'plug-in' to a wider test system, that we have 'stand-alone' and 'hosted' versions of JTAG within the embedded solutions. These are things we hadn't really appreciated before.
  • [Brad] I was thinking about requirements again. I guess what you put in the headings in many cases will be 'it depends'. Looking at microTCA and what happened when they added JTAG there - what most people wanted was multidrop, but these were mezzanines with a directly connected TAP with no gateway. This led to the concept of the JTAG switch: This was different and came after the first version of the White Paper, so the the first White Paper completely missed this.
  • [Brad] The system requirements drove that. Perhaps we should look at decomposing some existing systems, and look at what is available?
  • [Ian] That sounds like a plan, but do we want to concentrate more on a 'best practice' for future systems, rather than tying ourselves into legacy architectures? I'm wary of being too retrospective.
  • [Brad] I understand what you mean, but I can also see that there are ways to draw an abstraction for legacy systems.
  • [Ian] Do we look at the decompositions or the headings first? Is there a preference?
  • [Brad] I think the headings should be first; they give us a goal.
  • [Tim] I don't know how this fits in, but what about the hardware state? If a system is busy, you don't want to interrupt it. If you need to be in a Test Mode, how do you get that across?
  • [Carl] This is the online versus offline diagnostics debate.
  • [Brad] We have to remember that we're governed by a higher level process. That should be aware of what states are critical.
  • [Carl] And what constitutes 'disruptive'.
  • [Brad] Nonintrusive tests can take place at any time. You can say that in order to run this test you need to be in one of this set of states.
  • [Brad] If you decide to run a test outside of those states, then that's not our problem. It gets back to the delegation issue.

7. Review new action items

  • Brad: Provide citation for paper on distributed systems.
  • Ian: Create sanitised survey results page and post link on private forums.

8. Adjourn

Tim moved to adjourn at 11:47 AM EST, seconded by Brad.

Respectfully submitted,
Ian McIntosh