NEW: Learning electronics? Ask your questions on the new Electronics Questions & Answers site hosted by CircuitLab.
Support Forum » SPI reliability issues
November 11, 2012 by sask55 |
I am having problems getting consistent and reliable SPI communications between a master and a slave chip. I believe the problem must be hardware related to some extent. Everything had been working well until I changed the layout on the board somewhat. Everything is connected the same but the layout of the components is changed. Now I cannot maintain a consistent SPI link between the two chips. I have rewired and doubled up the SPI connections in an effort to reduce the chances that it is just a bad wire connection. I decided to use the SPI code that Noter posted as an example, Noters SPI I switched to Noter;s code for both the master and the slave. I made the change because Noter’s code reports when an error in communication has occurred. I thought it would be a simple way to test the reliability of the SPI communication link. Initially the only change I made was to the size of the TX loop in the master code, I changed from the 100 count loop to a 255 count loop. Using that code my setup seldom gets thru the entire loop count before reporting an error. The results are very inconsistent. Each time I run it I seem to get to a different count. I wanted to see what effect a delay between data transitions may have on the SPI reliability. I placed a 50 ms delay in the code on the master. The delay is immediately after the slave is deselected. I did this test primarily because this type of delay would be more representative of the type of data flow my code would be producing between the master and the slave and then the slave back to the master. With the delay in place the SPI seldom gets past more then a few bytes before an error occurs. I added 4- 1K ohm resistors to the board as used in the Multi-Panel LED Array Tutorial. My slave and master chips are less then 2 inches apart the connecting wire are short but I thought it was worth a try. It made no change. I estimate that the two way communication success rate would be in the order of about 90%. Having 10% of the TX bytes not returning a valid byte on the RX is far too large of an error rate. I don’t understand why my previous layout seamed to work and this one does not work very well especially if there are delays in the data stream. What factors will affect the reliability of the SPI communication? Any suggestions or ideas on how to make much more a reliable SPI connection would be appreciated. |
---|---|
November 12, 2012 by JimFrederickson |
The "electrically induced error rate" of the SPI Data/Control Signals for 1 Master and 1 slave a few inches apart really is pretty much 0%. That being said... Problems can arise if you are trying to PUSH SPI Communications too fast for the speed of your Microcontrollers. The can become readily apparent if you are using the same software stack on a Master and slave with a Master running off a crystal and the slave running off of a slower Clock Source. (Namely something like the Internal 8mhz Oscillator...) There can be other problems with your Circuit, including layout, but I have not run into that before. If your chips are doing more than just SPI then try to program them just to do SPI only and at a slower bit-rate. (Also removing any extraneous hardware you have associated with something other than SPI too.) See what happens then. Once the error rate back to, essentially, 0% then add things back in and see where it goes awry... |
November 12, 2012 by Noter |
Could be EMF interfering with the mcu clocks rather than corrupting data on the SPI wires. Also could be noise from the power supply. Usually a 470uf capacitor across VCC to GND will take care of a noisy power supply. As for EMF, just have to find the source and turn it off or shield it. |
November 12, 2012 by sask55 |
In my set up both chips are running off 14.56 mz crystals. I originally had four chips , the master and three slaves set up on my pervious layout. That layout was working with good, two way, SPI communications with all three slaves, I was getting valid calliper position data from all of them. I rearranged the layout to accommodate two more motor controller chips and the related support hardware. After changing the positioning of the components none of the SPI communication is working reliably. I am completely baffled by this finding. I noticed that nothing I was doing seams to change the outcome so I eventually ended up completely removing the slave chip from the board. The four pins involved with the SPI on the master are not connected to anything at all. The master chip has the code supplied by Noter. When I supply power to the master I usually see the expected message on the LCD, “ERROR at byte 1 Expected 1 got 0”. The thing is I don’t always get that message. I had seen counts as high as 98 with no slave chip connected to the master at all. That is I got number 1 thru 97 showing up as “1 OK, 2 OK, …” then I see the message “ERROR at byte 98 Expected 98 got 18. The “got” number is often 0 as I would expect but often some other apparently random byte. How in the world did the master manage to get a byte_received ==i even once let along 97 times thru the loop? All this leads me to wonder if the slave chip has ever returned a valid byte to the master since I rebuilt the board. My attempt to use Noter’s example code as a SPI reliability test appears to have failed. It would seam that the spi_xmit_char function is just returning the same values back to the main that where past to the function. So SPI may not even be involved at all. How is it possible that this code running on a MCU will often loop a number of times before it breaks out of the for loop even when there is no slave chip? I have tired a couple of different clock bit rate ratios that does not change anything.
|
November 12, 2012 by sask55 |
I should have mentioned I don’t have a 470uf cap. on hand right now. I have been running this setup with a 47 Uf cap and a .1 uf cap both across the high voltage, 12V side, of the voltage regulator. As well as a 150 uf and a .1 uf across the Vcc 5Volt side of the voltage regulator. I have a 100nf cap placed on the board for each mcu between pins 8 and 9. I just tried a 1000 uf cap. Across VCC to ground it does not appear to change anything. |
November 12, 2012 by pcbolt |
sask55 - After you changed positions of the components, how far apart were the master and slave? You might be getting interference because of the wire length and/or bend configuration (inductance). You could try slowing down the SPI data rate or install some resistors as shown in the Multi-Panel LED tutorial. |
November 12, 2012 by pcbolt |
Ooops...sorry. Looking back, I saw you tried slowing the clock down...nevermind ;-) |
November 12, 2012 by pcbolt |
sask55 - If no data comes back from a slave, the SPDR register will just have what was in it before. The "while(!(SPSR & _BV(SPIF)));" line just waits for the transmission to end, not for a receive data. So when you send out "i" and return SPDR it will still have "i" in it. According to your comments in the code above, you should be checking to see if it equals i minus 1. As far as the weird bytes received every so often, that may be due to the pin floating with no connection. If you ground the MISO pin, it may stop the false bytes. |
November 12, 2012 by sask55 |
pcbolt I see what you are saying. Noter was adding 1 to the value of i and having the slave send that value back to the master on the next set of tx/rx SPI bytes. That makes the new value of i after it is incremented in the loop the same as the value expected from the slave. Changing the value of i in the slave seams like a good way to confirm that the slave is involved. Unfortunately by doing that it is not possible to distinguish if the value returned from the SPI function is actually a value coming from data received from the slave or the value still remaining in SPDR when the transmission ended. I am changing he code in both the master and slave a bit to eliminate that possible source of confusion. Doing that simple change should allow me to test the SPI more confidently and track down the problem. |
November 12, 2012 by Noter |
As the SPI shifts out a bit it is also shifting in a bit on the receive side so the receive buffer is new every time. If there is no slave then zeros will be shifted in and the value should be zero assuming the pins are left floating. I haven't run this for quite a while but I just loaded the master and ran it without a slave and it errors on the 1st byte with expected 1, got 0. Then to be sure I didn't miss something in the master code you posted above I loaded it and ran and got the same result, error on first byte. If the break on line 108 is commented out or missing it will run to the end of the loop even with errors and it is so fast all you will see is the last one. I wonder if that could be happening? Are you sure your code is compiling/loading and your're not running a previous of code on the chip? Do you have anything connected to the SPI lines when you run? I leave my programmer connected but it sets all ports to high impedance after loading a program so it's as good as not there. Can you post a picture of your set up and maybe we can see something? |
November 12, 2012 by Noter |
Just got the slave loaded and in the circuit and now I get "99 OK". The master/slave SPI samples seem to be working correctly. |
November 12, 2012 by Noter |
I just removed the slave from the circuit and looped MOSI back to MISO and ran. The master reported "99 OK". Check with your ohm meter that your MOSI and MISO are not shorted out/crossed over anywhere. Another test is tie MISO to 5v so that it will always read a 1 bit. Run again and your master should report "expected 1, got 255". Then tie it to GND and you should see "got 0". Possibly you have an intermittent MOSI/MISO short somewhere. |
November 12, 2012 by pcbolt |
Noter - Thought for sure it worked the other way, but it makes sense what you say. A MISO/MOSI short would explain the results...possibly inductive or capacitive crosstalk? If so it might be hard to detect unless you ground the pins like you said. Sorry 'bout the bum steer sask55 (no pun intended to your avatar Noter :-) |
November 13, 2012 by sask55 |
Noter I did not get a chance to do anything with this last night. The results are not consistent. I may get ten or more restarts that all produce expected results then two or three in row that produce seemingly random numbers of counts and “got” numbers. I will get back to the bench later today and reload the chips. I could post a picture of part of my setup it is very involved, at the present time I removed all of the wires from the master except the USB ,LCD display, power ground and crystal connections. The slave chips motor control chips and other components are not connected to the master at this time. It is the basic nerd kit set up. There is nothing at all on any of the SPI pins (16-19). I will try a couple of your ideas latter and perhapes post a picture when I get a few minutes. |
November 13, 2012 by Noter |
Sask55, I had a problem with a breadboard short once that took quite a while to finally figure out. Using a new breadboard solved my problem. Last resort you should try that. Another thing is to be sure none of your other mcu's or aux circuits are powered while trying to figure out this problem just to be sure there is no stray EMF jumping in. Double check your connections too to be sure they are not loose. And keep the big capacitor across VCC and GND to eliminate any possibility of noise from the power supply. Comment out the master code that displays the "OK" at line 101 to eliminate activity on the LCD during the test, put in something at 109 to indicate the test is complete. Also, try a new or different chip. I have been working on a clock project recently that has also seen a lot of errors on the SPI line. I finally figured out it was from EMF produced by one of the slaves driving a large 7 seg display. The segments receive 12v pulses for 50us each (duty cycle 5% each seg) and since the display is too large for the breadboard I have long wires (~10 inches) going to it. Those long wires act as an antenna transmitting EMF which is interfering with the clock in the master and slaves causing them to loose sync. It also causes my clock chip to gain time. When the 7 seg driver is not powered everything works fine. I figure once it is off the breadboard and on a PCB the lines will be short enough to not cause the EFM problem so I am proceeding anyway, I just leave it powered off most of the time while I work on other things. By the way pcbolt, it's not a steer, he's my dexter bull. He tolerates me for a couple of minutes as long as I start the session with a treat. |
November 13, 2012 by pcbolt |
Noter - I guess this is one case where a close-up photo is NOT needed! Just have to take your word for it. |
November 13, 2012 by Ralphxyz |
I worked around some Holstein bulls, some very large Holstein bulls that were like that. Once your time was up, your time was up period if you did not exit the area rather quickly. Ralph |
November 13, 2012 by sask55 |
The SPI is working well again on my setup, at least with one slave and the sample code supplied by Noter. It seams that there must have been some kind of intermittent short or some problem in the chip I was using as the master. The problem must be internal in the chip itself. That chip eventually stopped indicating it was receiving SPI under any circumstances. That is more understandable then when it was indicating it is receiving valid SPI rx bytes even when there is no slave chip on the board or any SPI connections at all. I have removed many connections and other components from the board by the time I swapped out the master MCU. I am rebuilding the setup in stages in an effort to make troubleshooting simpler. Thanks for the help. |
November 13, 2012 by JimFrederickson |
If you have not done so already, you should reprogram the Master MCU that you swapped out with the same code that you are currently using that seems to be working. Then swap it for the Master MCU that you are currently using, in the same exact circuit, just to verify that the chip has a problem. It is possible that a port or I/O pin was blown by something in your circuit. Since your circuit is working properly now you can take this opportunity to verify that the previous chip actually is not working at this time. (While you may not care if that particular chip is usable or not, determining if it can work will also let you know if there was possibly something else in your circuit that was a problem rather than the problem being isolated solely to a chip problem.) (Since you have "removed many connection and other components" this determination could be helpful...) Rebuilding in stages is ALWAYS a good idea. Keep in mind as well that while "failures" can be caused by something you "add" the "failure doesn't actually have to occur in the particular portion that you just added"... (Which I think from your previous posts you do realize, but just in case...) |
November 14, 2012 by sask55 |
Jim I took your advice and retrieved he original master chip from the trash. just to verify it is no good. There is defiantly something wrong with it, even the problems using it are not consistent and change from time to time. I have tried swapping the two chip back and forth a couple of time. The “new” chip is working fine every time I try it. The original chip communicates on the uart with the PC. There is no problem loading code unto it. The original chip is now sending SPI bytes to the slave. I can see that because I have set up eight LEDs on PORTD and set those pins for output on the slave. In that way I can monitor the SPI received by the slave with a PORTD= SPCR; statement in the slave code. I can see that the original master chip is now sending code to the slave as the slave is lighting up the appropriated LEDs. However, now the original master will not show anything at all on the LDC display. I am just getting the two black bars on the LCD. That is nothing like what it was doing before when I could not get it to send of receive SPI but I was getting LCD display as expected. In any event it is not working correctly even if it is not doing the same things as it was before. I am satisfied that that chip was the root of the problem, the actual cause of the failure is another question. It is possible that I had inadvertently connected something wrong before and hopefully will not do it again. There are literally dozens of connections between the 12 ICs and supporting components on this setup. Anyway everything is back on track for now. thanks again |
Please log in to post a reply.
Did you know that Morse code is a compact way to transmit human-readable text over binary channels? Learn more...
|