From: "Kevin Neilson" Newsgroups: comp.arch.fpga Subject: BlockRAM outputs and the Placer Lines: 24 X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 5.50.4133.2400 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 Message-ID: Date: Sat, 28 Apr 2001 03:59:07 GMT NNTP-Posting-Host: 209.245.11.3 X-Complaints-To: abuse@earthlink.net X-Trace: newsread2.prod.itd.earthlink.net 988430347 209.245.11.3 (Fri, 27 Apr 2001 20:59:07 PDT) NNTP-Posting-Date: Fri, 27 Apr 2001 20:59:07 PDT Organization: EarthLink Inc. -- http://www.EarthLink.net X-Received-Date: Fri, 27 Apr 2001 20:57:40 PDT (newsmaster1.prod.itd.earthlink.net) Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news.ifi.unizh.ch!news.imp.ch!news.stealth.net!63.208.208.143.MISMATCH!feed2.onemain.com!feed1.onemain.com!newsfeed2.earthlink.net!newsfeed.earthlink.net!newsmaster1.prod.itd.earthlink.net!newsread2.prod.itd.earthlink.net.POSTED!not-for-mail Xref: chonsp.franklin.ch comp.arch.fpga:5977 I had a problem with the BRAM on my last project. The Xilinx BRAM clock->out time is about 3ns. This is fine. The problem I had was that the nets connecting the BRAM data outputs to other logic seemed to have excessively long delays, many over 3ns. Even when I handplaced flops right next to the BRAMs the net delays still seemed excessive. This was in a part that was only 30% utilized, so there were no real routing issues. Have others had this problem? I registered the BRAM outputs, but in many cases it was still over 7ns to get the data out, across the net, and meet the flop setup time. This raises an additional question about the placer. The placer seems to make some really poor decisions. All of my critical nets had huge delays because parts that should have been placed together were inexplicably placed on opposite sides of the part. Place and route seem to be completely independent processes. Isn't there a way to place, route, and then place again based on the route information, and then route again? This seems like a good idea, because on the second placement round, the placer could look at the critical nets from the previous route and say, "Whoa, it take 4ns to get from flop A to flop B; maybe I should place those closer together and try routing again." But from what I can tell, placement is done first, and then locked down, with no regard to the post-route net delays. ###### Message-ID: <3AEA63A5.1A4F1C8@nowhere.com> From: Anonymous Idiot X-Mailer: Mozilla 4.76 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 Newsgroups: comp.arch.fpga Subject: Re: BlockRAM outputs and the Placer References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Lines: 21 Date: Fri, 27 Apr 2001 23:31:01 -0700 NNTP-Posting-Host: 207.214.177.182 X-Complaints-To: abuse@pacbell.net X-Trace: nnrp5-w.sbc.net 988439012 207.214.177.182 (Fri, 27 Apr 2001 23:23:32 PDT) NNTP-Posting-Date: Fri, 27 Apr 2001 23:23:32 PDT Organization: SBC Internet Services Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!newsfeed-zh.ip-plus.net!news.ip-plus.net!news.tesion.net!news.belwue.de!news-stu1.dfn.de!news-mue1.dfn.de!news-was.dfn.de!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!feed2.onemain.com!feed1.onemain.com!cyclone-sf.pbi.net!206.13.28.183!nnrp5-w.sbc.net.POSTED!not-for-mail Xref: chonsp.franklin.ch comp.arch.fpga:5967 > This raises an additional question about the placer. The placer seems to > make some really poor decisions. All of my critical nets had huge delays > because parts that should have been placed together were inexplicably placed > on opposite sides of the part. Place and route seem to be completely > independent processes. Isn't there a way to place, route, and then place > again based on the route information, and then route again? This seems like > a good idea, because on the second placement round, the placer could look at > the critical nets from the previous route and say, "Whoa, it take 4ns to get > from flop A to flop B; maybe I should place those closer together and try > routing again." But from what I can tell, placement is done first, and then > locked down, with no regard to the post-route net delays. You've pretty much described "physically knowledgable synthesis"...a concept Synopsys, Cadence, and everyone else are pushing toward. Those software tools don't come cheap... It's just a matter of time before FPGA vendors incorporate the same ideas into their development tool suite. ###### From: Allan Herriman Newsgroups: comp.arch.fpga Subject: Re: BlockRAM outputs and the Placer Date: Sat, 28 Apr 2001 23:34:30 +1000 Organization: Agilent Technologies Lines: 92 Message-ID: <3AEAC6E5.3924F651@agilent.com> References: NNTP-Posting-Host: cswreg.cos.agilent.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: cswtrans.cos.agilent.com 988455868 26374 130.29.154.45 (28 Apr 2001 11:04:28 GMT) X-Complaints-To: usenet@cswtrans.cos.agilent.com NNTP-Posting-Date: Sat, 28 Apr 2001 11:04:28 +0000 (UTC) X-Mailer: Mozilla 4.7 [en] (WinNT; I) X-Accept-Language: en Cache-Post-Path: cswreg.cos.agilent.com!unknown@146.223.162.153 X-Cache: nntpcache 2.3.3 (see http://www.nntpcache.org/) Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news.dplanet.ch!news-ge.switch.ch!news.bme.hu!news.tele.dk!204.94.211.44!enews.sgi.com!sdd.hp.com!agilent.com!not-for-mail Xref: chonsp.franklin.ch comp.arch.fpga:5969 Kevin Neilson wrote: > > I had a problem with the BRAM on my last project. The Xilinx BRAM > clock->out time is about 3ns. This is fine. The problem I had was that the > nets connecting the BRAM data outputs to other logic seemed to have > excessively long delays, many over 3ns. Even when I handplaced flops right > next to the BRAMs the net delays still seemed excessive. This was in a part > that was only 30% utilized, so there were no real routing issues. > > Have others had this problem? I registered the BRAM outputs, but in many > cases it was still over 7ns to get the data out, across the net, and meet > the flop setup time. > > This raises an additional question about the placer. The placer seems to > make some really poor decisions. All of my critical nets had huge delays > because parts that should have been placed together were inexplicably placed > on opposite sides of the part. Place and route seem to be completely > independent processes. Isn't there a way to place, route, and then place > again based on the route information, and then route again? This seems like > a good idea, because on the second placement round, the placer could look at > the critical nets from the previous route and say, "Whoa, it take 4ns to get > from flop A to flop B; maybe I should place those closer together and try > routing again." But from what I can tell, placement is done first, and then > locked down, with no regard to the post-route net delays. Hi Kevin, I've had the same problem. Even with a mostly empty chip, it's still possible to have route congestion around the block rams. My fix was to hand place the flip flops close to the block rams. (Actually, I wrote a Perl script to generate the UCF.) This helped a lot, but I was still getting congestion. The block ram signals were so congested, PAR was using route-throughs. (A route-through is when PAR passes the signal through a CLB to get around the lack of routing resources.) You can check your timing report (or FPGA editor) to find route-throughs. Unfortunately, there is no way to tell PAR to not use route-throughs on some nets. Xilinx Support suggested setting the timespec priority, but that seems to only be used to resolve which timespec gets used when there are multiple timespecs covering the same path. It's also possible to set the priority in the PCF, but that made the tools crash, so I wasn't able to find out whether it would have helped. In the end, I made it work by 1. Using a timespec that was faster than it needed to be on the paths that were giving problems. Try a few different values. 2. Hand placing everything. (That Perl script again...) Automatic blockram placement is poor. Don't rely on it. You may be able to get away with automatic placement (perhaps aided by area constraints) for the flip flops. 3. MPPR. Occasionally, I would get a cost table that caused the tools to actually give acceptable results. The best cost table would change every time the design changed. 4. Making the other parts of the design easier to route, mostly through source code changes (pipelining, etc.). It seems that PAR only expends so much effort before it gives up, so don't waste any of that effort on something that isn't so important. Similarly, don't overconstrain parts of the design that don't need it. 5. (The most powerful technique of them all :) Make small changes to unrelated parts of the design, then run it through the tools again. This may perturb the placement enough to make a big difference to the critical path. 6. Move other, unrelated parts of the design further away, so that they don't interfere with the critical stuff. Similarly, move the related parts of the design closer, so they don't "pull" on your critical paths as much. 7. Don't trust Map. It will sometimes put resources that really should be on opposite sides of the chip into the same CLB. There is nothing PAR can do about this, and there is no feedback from placement to Map, so Map doesn't know it's not doing the right thing. Work around this by using placement constraints on one or more of the problem resources, so that Map won't group them. This effect is really easy to see in the floorplanner. While I'm here... Did anyone else notice how the block ram timing tBCKO *more than doubled* with the 1.56 speed files (Virtex-E -8 speed grade)? That was after the parts had been available for almost a year. Was this change yield related? Good luck, Allan. ###### Message-ID: <3AEAD134.A7C36184@home.com> From: Phil Hays Organization: "Real email is phil-hays aATt homeDOTcom" X-Mailer: Mozilla 4.7 [en] (WinNT; U) X-Accept-Language: en MIME-Version: 1.0 Newsgroups: comp.arch.fpga Subject: Re: BlockRAM outputs and the Placer References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Lines: 30 Date: Sat, 28 Apr 2001 14:20:53 GMT NNTP-Posting-Host: 65.4.188.174 X-Complaints-To: abuse@home.net X-Trace: news1.sttls1.wa.home.com 988467653 65.4.188.174 (Sat, 28 Apr 2001 07:20:53 PDT) NNTP-Posting-Date: Sat, 28 Apr 2001 07:20:53 PDT Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news.ifi.unizh.ch!news.imp.ch!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!wn1feed!worldnet.att.net!24.0.0.38!newshub2.rdc1.sfba.home.com!news.home.com!news1.sttls1.wa.home.com.POSTED!not-for-mail Xref: chonsp.franklin.ch comp.arch.fpga:6007 Kevin Neilson wrote: > This raises an additional question about the placer. The placer seems to > make some really poor decisions. All of my critical nets had huge delays > because parts that should have been placed together were inexplicably placed > on opposite sides of the part. Place and route seem to be completely > independent processes. Isn't there a way to place, route, and then place > again based on the route information, and then route again? This seems like > a good idea, because on the second placement round, the placer could look at > the critical nets from the previous route and say, "Whoa, it take 4ns to get > from flop A to flop B; maybe I should place those closer together and try > routing again." But from what I can tell, placement is done first, and then > locked down, with no regard to the post-route net delays. Welcome to FPGA design. The only comments you missed was that synthesis and mapping are also an independant processes. Sooner or later you will get two flops that need to be in opposite corners mapped into the same CLB. Place and route then have no chance at all of making timing. And synthesis will build logic that just can't work, as the really long, slow net ends up going through the most LUTs. As another person replied, "physically knowledgable synthesis" is one solution that does work. Pricy, yes. Has some important limitations, yes. I've used Amplify by Synplicity, and it's a good tool. Another solution is putting placement information on key elements of the design using VHDL attributes or the .ucf file. -- Phil Hays