From: "Carl Brannen" Newsgroups: comp.arch.fpga Subject: Barrel shifter puts three 2->1 muxes / slice in Xilinx Date: Tue, 18 Dec 2001 19:20:37 +0000 (UTC) Organization: Mailgate.ORG Server - http://www.Mailgate.ORG Lines: 92 Message-ID: NNTP-Posting-Host: firewall.terabeam.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Trace: news.mailgate.org 1008686318 2620 216.137.15.2 (Tue Dec 18 20:20:37 2001) X-Complaints-To: abuse@mailgate.org NNTP-Posting-Date: Tue, 18 Dec 2001 19:20:37 +0000 (UTC) Injector-Info: news.mailgate.org; posting-host=firewall.terabeam.com; posting-account=51709; posting-date=1008686318 User-Agent: Mailgate Web Server X-URL: http://www.Mailgate.ORG Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news.mailgate.org!web2news!firewall.terabeam.com!not-for-mail Xref: chonsp.franklin.ch comp.arch.fpga:12513 This came as a result of thinking about how to make more efficient barrel shifters. Most readers are familiar with fall through barrel shifters and how they're usually implemented. (With columns of 2 to 1 muxes that each column dedicated to shifting the data by a different power of 2.) I got to thinking about how to use MUXF5s for barrel shifters. It's clear that if you could bring out the terms that feed the MUXF5 you could get three results out of a slice instead of just two. That would essentially give me three 2 to 1 muxes in one slice instead of two, and it would really improve barrel shifter packing (and maybe be good for random logic use). The problem with doing it is that it's hard to get the output of the "F" LUT out of the slice. But it can be done by brining it out the CARRY-OUT. You use a MUXCY, and apply '0' and '1' to the DI and CI inputs, and the "F" LUT output to the S input of the MUXCY. That programs the MUXCY to be a buffer of the "F" LUT, and you get the (otherwise hidden) "F" LUT output as the carry-out (which can easily route to the outside of the slice). The only problem with that is that it's very hard to program the CI of the MUXCY to '1' and still use the MUXF5. In fact, since the BX input is going to have to be used by the 'S' input of the MUXF5, you have to use the carry input of the slice. And that puts the problem of generating a '1' into the next door neighbor to the slice, where you'll have to use a LUT to generate it, thereby wasting the LUT you were trying to save. That was where my analysis ended, but the other night I realized that for a barrel shifter, I don't have to control the value of that carry-out at all times. I only need to control it when I'm actually going to select it in the next stage of logic. And since the selector for the next stage of logic will be the same selector as is coming in on the 'S' pin for the MUXF5, that suggests that there might be a solution. In fact there is. You program the "DI" input of the MUXCY to '1', and connect both the MUXCY.CI input and the MUXF5.S input to the same BX input. That's a natural use for the carry input, but it's usually not done because normally arithmetic logic is not used at the same time as the MUXF5 pin. But you can do it. The result is that when the MUXF5 selects the "F" LUT, (i.e. when the BX input is '1'), the MUXF5 operates normally, but the Carry-out will be forced to '1'. But that's the condition under which you would normally ignore the carry-out anyway, if you were using the circuit as a barrel shifter (with positive shift amount). On the other hand, when the SHIFT input is low, the "G" LUT is selected for the MUXF5, and the DI and CI inputs of the MUXCY end up as '1' and '0' respectively. That causes the CARRY-OUT to follow the complement of the "F" LUT output. But we all know how easy it is to invert logic in a Xilinx, so I just put an inverter on the CARRY-OUT and the logic takes care of getting rid of the inversion for me. The router isn't too good at combining MUXCYs and MUXF5s, so to get this into a single slice I have to RLOC it. But the good news is that this works. Generalizing, this means that I can get certain collections of 3 logic functions in a single slice. The general rule is: The three logic functions are {F,G',F5}, and the 9 input variables are {F1,F2,F3,F4,G1,G2,G3,G4, and BX} F <= LUT(F1,F2,F3,F4); G' <= LUT(G1,G2,G3,G4) nand BX; with BX select F5 <= G when '0', F when others; I thought this was cool. It allows a 16-bit wide 0 to 7 bit barrel shifter in just 36 LUTs which is 12 less than the number needed to create a barrel shift the usual way. The logic for the above was implemented in schematics, but I could easily convert this to VHDL if anyone is interested. I also figured out a way to program a column of slices to perform a vector of 3 to 1 muxes instead of just 2 to 1 muxes. This can be used to create very efficient barrel shifters where the shift amount is a power of 3. An example would be a barrel shifter that shifts between 0 and 8 bits. With the usual barrel shift technique, such a barrel shifter would require 4 stages, but using 3 to 1 muxes it requires only 2 stages. There's some fixed costs associated with computing the controls for the stages, (and since it uses arithmetic functions), driving the CARRY-IN for each stage. (It's actually more complex than I'm implying here.) I'll post code for it if anyone is interested. Carl -- Posted from firewall.terabeam.com [216.137.15.2] via Mailgate.ORG Server - http://www.Mailgate.ORG ###### From: Steven Derrien Newsgroups: comp.arch.fpga Subject: Re: Barrel shifter puts three 2->1 muxes / slice in Xilinx Date: Tue, 18 Dec 2001 20:36:44 +0100 Organization: INRIA - RENNES Lines: 105 Message-ID: <3C1F9ACC.AFDA823@irisa.fr> References: NNTP-Posting-Host: spyder.irisa.fr Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: news.irisa.fr 1008704203 30176 131.254.51.10 (18 Dec 2001 19:36:43 GMT) X-Complaints-To: usenet@irisa.fr NNTP-Posting-Date: 18 Dec 2001 19:36:43 GMT X-Mailer: Mozilla 4.75 [en] (WinNT; U) X-Accept-Language: en, fr Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news.mailgate.org!skynet.be!skynet.be!freenix!fr.usenet-edu.net!usenet-edu.net!ciril.fr!univ-angers.fr!news!univ-rennes1.fr!irisa.fr!not-for-mail Xref: chonsp.franklin.ch comp.arch.fpga:12550 Hello, This could be very useful for optimizing floating point adders or substracters, since they use large barrel shifters for normalization and denormalization. If you have some VHDL for this I'd be eager to use it to see how it affects area and speed !! Steven Carl Brannen wrote: > This came as a result of thinking about how to make more efficient barrel > shifters. Most readers are familiar with fall through barrel shifters and how > they're usually implemented. (With columns of 2 to 1 muxes that each column > dedicated to shifting the data by a different power of 2.) > > I got to thinking about how to use MUXF5s for barrel shifters. It's clear that > if you could bring out the terms that feed the MUXF5 you could get three > results out of a slice instead of just two. That would essentially give me > three 2 to 1 muxes in one slice instead of two, and it would really improve > barrel shifter packing (and maybe be good for random logic use). > > The problem with doing it is that it's hard to get the output of the "F" LUT > out of the slice. But it can be done by brining it out the CARRY-OUT. You use > a MUXCY, and apply '0' and '1' to the DI and CI inputs, and the "F" LUT output > to the S input of the MUXCY. That programs the MUXCY to be a buffer of the "F" > LUT, and you get the (otherwise hidden) "F" LUT output as the carry-out (which > can easily route to the outside of the slice). > > The only problem with that is that it's very hard to program the CI of the > MUXCY to '1' and still use the MUXF5. In fact, since the BX input is going to > have to be used by the 'S' input of the MUXF5, you have to use the carry input > of the slice. And that puts the problem of generating a '1' into the next door > neighbor to the slice, where you'll have to use a LUT to generate it, thereby > wasting the LUT you were trying to save. > > That was where my analysis ended, but the other night I realized that for a > barrel shifter, I don't have to control the value of that carry-out at all > times. I only need to control it when I'm actually going to select it in the > next stage of logic. And since the selector for the next stage of logic will > be the same selector as is coming in on the 'S' pin for the MUXF5, that > suggests that there might be a solution. > > In fact there is. You program the "DI" input of the MUXCY to '1', and connect > both the MUXCY.CI input and the MUXF5.S input to the same BX input. That's a > natural use for the carry input, but it's usually not done because normally > arithmetic logic is not used at the same time as the MUXF5 pin. But you can do > it. > > The result is that when the MUXF5 selects the "F" LUT, (i.e. when the BX input > is '1'), the MUXF5 operates normally, but the Carry-out will be forced to '1'. > But that's the condition under which you would normally ignore the carry-out > anyway, if you were using the circuit as a barrel shifter (with positive shift > amount). > > On the other hand, when the SHIFT input is low, the "G" LUT is selected for the > MUXF5, and the DI and CI inputs of the MUXCY end up as '1' and '0' > respectively. That causes the CARRY-OUT to follow the complement of the "F" > LUT output. But we all know how easy it is to invert logic in a Xilinx, so I > just put an inverter on the CARRY-OUT and the logic takes care of getting rid > of the inversion for me. > > The router isn't too good at combining MUXCYs and MUXF5s, so to get this into a > single slice I have to RLOC it. But the good news is that this works. > > Generalizing, this means that I can get certain collections of 3 logic > functions in a single slice. The general rule is: > > The three logic functions are {F,G',F5}, and the 9 input variables are > {F1,F2,F3,F4,G1,G2,G3,G4, and BX} > > F <= LUT(F1,F2,F3,F4); > > G' <= LUT(G1,G2,G3,G4) nand BX; > > with BX select > F5 <= > G when '0', > F when others; > > I thought this was cool. It allows a 16-bit wide 0 to 7 bit barrel shifter in > just 36 LUTs which is 12 less than the number needed to create a barrel shift > the usual way. > > The logic for the above was implemented in schematics, but I could easily > convert this to VHDL if anyone is interested. > > I also figured out a way to program a column of slices to perform a vector of 3 > to 1 muxes instead of just 2 to 1 muxes. This can be used to create very > efficient barrel shifters where the shift amount is a power of 3. An example > would be a barrel shifter that shifts between 0 and 8 bits. With the usual > barrel shift technique, such a barrel shifter would require 4 stages, but using > 3 to 1 muxes it requires only 2 stages. There's some fixed costs associated > with computing the controls for the stages, (and since it uses arithmetic > functions), driving the CARRY-IN for each stage. (It's actually more complex > than I'm implying here.) I'll post code for it if anyone is interested. > > Carl > > -- > Posted from firewall.terabeam.com [216.137.15.2] > via Mailgate.ORG Server - http://www.Mailgate.ORG ###### From: Peter Alfke Newsgroups: comp.arch.fpga Subject: Re: Barrel shifter puts three 2->1 muxes / slice in Xilinx Date: Tue, 18 Dec 2001 12:12:25 -0800 Organization: Xilinx Lines: 113 Message-ID: <3C1FA329.76322373@xilinx.com> References: <3C1F9ACC.AFDA823@irisa.fr> Reply-To: peter.alfke@xilinx.com NNTP-Posting-Host: peter.xsj.xilinx.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; x-mac-type="54455854"; x-mac-creator="4D4F5353" Content-Transfer-Encoding: 7bit X-Mailer: Mozilla 4.77C-CCK-MCD {C-UDP; EBM-APPLE} (Macintosh; U; PPC) X-Accept-Language: en To: Steven Derrien Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news.mailgate.org!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!newsfeed.cwix.com!news!nntp.wetware.com!attdv1!attdv2!ip.att.net!newsgate.xilinx.com!cliff.xsj.xilinx.com!not-for-mail Xref: chonsp.franklin.ch comp.arch.fpga:12482 I agree that it looks very clever and interesting. But, just as an aside, floating point need arithmetic shifters for normalization, not barrel shifters. Also, remember that Virtex-II has lots of multipliers, many of them begging to be used as "free" shifters ( multipliy by a power of 2 ) Peter Alfke ==================================== Steven Derrien wrote: > Hello, > > This could be very useful for optimizing floating point adders or substracters, > since > they use large barrel shifters for normalization and denormalization. If you have > some VHDL for this I'd be eager to use it to see how it affects area and speed !! > > Steven > > Carl Brannen wrote: > > > This came as a result of thinking about how to make more efficient barrel > > shifters. Most readers are familiar with fall through barrel shifters and how > > they're usually implemented. (With columns of 2 to 1 muxes that each column > > dedicated to shifting the data by a different power of 2.) > > > > I got to thinking about how to use MUXF5s for barrel shifters. It's clear that > > if you could bring out the terms that feed the MUXF5 you could get three > > results out of a slice instead of just two. That would essentially give me > > three 2 to 1 muxes in one slice instead of two, and it would really improve > > barrel shifter packing (and maybe be good for random logic use). > > > > The problem with doing it is that it's hard to get the output of the "F" LUT > > out of the slice. But it can be done by brining it out the CARRY-OUT. You use > > a MUXCY, and apply '0' and '1' to the DI and CI inputs, and the "F" LUT output > > to the S input of the MUXCY. That programs the MUXCY to be a buffer of the "F" > > LUT, and you get the (otherwise hidden) "F" LUT output as the carry-out (which > > can easily route to the outside of the slice). > > > > The only problem with that is that it's very hard to program the CI of the > > MUXCY to '1' and still use the MUXF5. In fact, since the BX input is going to > > have to be used by the 'S' input of the MUXF5, you have to use the carry input > > of the slice. And that puts the problem of generating a '1' into the next door > > neighbor to the slice, where you'll have to use a LUT to generate it, thereby > > wasting the LUT you were trying to save. > > > > That was where my analysis ended, but the other night I realized that for a > > barrel shifter, I don't have to control the value of that carry-out at all > > times. I only need to control it when I'm actually going to select it in the > > next stage of logic. And since the selector for the next stage of logic will > > be the same selector as is coming in on the 'S' pin for the MUXF5, that > > suggests that there might be a solution. > > > > In fact there is. You program the "DI" input of the MUXCY to '1', and connect > > both the MUXCY.CI input and the MUXF5.S input to the same BX input. That's a > > natural use for the carry input, but it's usually not done because normally > > arithmetic logic is not used at the same time as the MUXF5 pin. But you can do > > it. > > > > The result is that when the MUXF5 selects the "F" LUT, (i.e. when the BX input > > is '1'), the MUXF5 operates normally, but the Carry-out will be forced to '1'. > > But that's the condition under which you would normally ignore the carry-out > > anyway, if you were using the circuit as a barrel shifter (with positive shift > > amount). > > > > On the other hand, when the SHIFT input is low, the "G" LUT is selected for the > > MUXF5, and the DI and CI inputs of the MUXCY end up as '1' and '0' > > respectively. That causes the CARRY-OUT to follow the complement of the "F" > > LUT output. But we all know how easy it is to invert logic in a Xilinx, so I > > just put an inverter on the CARRY-OUT and the logic takes care of getting rid > > of the inversion for me. > > > > The router isn't too good at combining MUXCYs and MUXF5s, so to get this into a > > single slice I have to RLOC it. But the good news is that this works. > > > > Generalizing, this means that I can get certain collections of 3 logic > > functions in a single slice. The general rule is: > > > > The three logic functions are {F,G',F5}, and the 9 input variables are > > {F1,F2,F3,F4,G1,G2,G3,G4, and BX} > > > > F <= LUT(F1,F2,F3,F4); > > > > G' <= LUT(G1,G2,G3,G4) nand BX; > > > > with BX select > > F5 <= > > G when '0', > > F when others; > > > > I thought this was cool. It allows a 16-bit wide 0 to 7 bit barrel shifter in > > just 36 LUTs which is 12 less than the number needed to create a barrel shift > > the usual way. > > > > The logic for the above was implemented in schematics, but I could easily > > convert this to VHDL if anyone is interested. > > > > I also figured out a way to program a column of slices to perform a vector of 3 > > to 1 muxes instead of just 2 to 1 muxes. This can be used to create very > > efficient barrel shifters where the shift amount is a power of 3. An example > > would be a barrel shifter that shifts between 0 and 8 bits. With the usual > > barrel shift technique, such a barrel shifter would require 4 stages, but using > > 3 to 1 muxes it requires only 2 stages. There's some fixed costs associated > > with computing the controls for the stages, (and since it uses arithmetic > > functions), driving the CARRY-IN for each stage. (It's actually more complex > > than I'm implying here.) I'll post code for it if anyone is interested. > > > > Carl > > > > -- > > Posted from firewall.terabeam.com [216.137.15.2] > > via Mailgate.ORG Server - http://www.Mailgate.ORG ###### Message-ID: <3C1FB0C3.37E32F2D@andraka.com> From: Ray Andraka Organization: Andraka Consulting Group, Inc X-Mailer: Mozilla 4.77 [en] (WinNT; U) X-Accept-Language: en MIME-Version: 1.0 Newsgroups: comp.arch.fpga Subject: Re: Barrel shifter puts three 2->1 muxes / slice in Xilinx References: <3C1F9ACC.AFDA823@irisa.fr> <3C1FA329.76322373@xilinx.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Lines: 34 Date: Tue, 18 Dec 2001 21:09:15 GMT NNTP-Posting-Host: 24.13.238.93 X-Complaints-To: abuse@home.net X-Trace: news1.wwck1.ri.home.com 1008709755 24.13.238.93 (Tue, 18 Dec 2001 13:09:15 PST) NNTP-Posting-Date: Tue, 18 Dec 2001 13:09:15 PST Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news.mailgate.org!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!feeder.via.net!newshub2.rdc1.sfba.home.com!news.home.com!news1.wwck1.ri.home.com.POSTED!not-for-mail Xref: chonsp.franklin.ch comp.arch.fpga:12496 Here's my two cents worth (maybe not even that much). 1) Peter, the term barrel shift is commonly (although technically incorrectly) applied to shifters which have a variable shift distance. The virtex II multipliers can in fact be used this way, but it can be done considerably faster (with more pipelining) in the fabric for very little additional cost, especially when you consider the resources taken by the added pipeline registers you need in front of and behind the multiplier to get any where close to the data sheet speeds. It all comes down to how do I best use the resources available to me. 2) The carry chain can also be used for a free doubler circuit. However, watch the timing. There exist false paths (that are also quite slow comparatively speaking) introduced by the non-standard use of the carry chain (the chain connections are only used to the next neighbor, not all the way up the chain). Timingwise, the conventional approach seems to yield better propagation delays in combinatorial only shifters, and considerably better times in fully pipelined shifters. This is a good trick to put in your back pocket for those times where the need for density outweighs the needs of the clock cycle. 3) I'd be interested in seeing your layout solution. The layout is not trivial to making this perform well. -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759 ###### From: "Carl Brannen" Newsgroups: comp.arch.fpga Subject: Re: Barrel shifter puts three 2->1 muxes / slice in Xilinx Date: Wed, 19 Dec 2001 04:05:32 +0000 (UTC) Organization: Mailgate.ORG Server - http://www.Mailgate.ORG Lines: 98 Message-ID: <04af47cf81c9461ae53b109bdf127ac1.51709@mygate.mailgate.org> References: <3C1F9ACC.AFDA823@irisa.fr> <3C1FA329.76322373@xilinx.com> <3C1FB0C3.37E32F2D@andraka.com> NNTP-Posting-Host: firewall.terabeam.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Trace: news.mailgate.org 1008715124 10527 216.137.15.2 (Wed Dec 19 05:05:32 2001) X-Complaints-To: abuse@mailgate.org NNTP-Posting-Date: Wed, 19 Dec 2001 04:05:32 +0000 (UTC) Injector-Info: news.mailgate.org; posting-host=firewall.terabeam.com; posting-account=51709; posting-date=1008715124 User-Agent: Mailgate Web Server X-URL: http://www.Mailgate.ORG Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news.mailgate.org!web2news!firewall.terabeam.com!not-for-mail Xref: chonsp.franklin.ch comp.arch.fpga:12511 Ray: >2) The carry chain can also be used for a free doubler circuit. > However, watch the timing. There exist false paths (that are > also quite slow comparatively speaking) introduced by the > non-standard use of the carry chain (the chain connections > are only used to the next neighbor, not all the way up the > chain). Timingwise, the conventional approach seems to yield > better propagation delays in combinatorial only shifters, and > considerably better times in fully pipelined shifters. This > is a good trick to put in your back pocket for those times > where the need for density outweighs the needs of the clock > cycle. This is true. I have another barrrel shift design that's based on the use of the carry chain. I'll post it to the thread later, but as a new topic head. But the way I used the Carry chain the false path was a true one. That is, under certain (rare?) circumstances, the carry chain may have to propagate a signal from one end to the other. I'll post an explanation when I put up the VHDL for that barrel shifter (later tonight, maybe). I think this is enough cool circuits for one thread. > 3) I'd be interested in seeing your layout solution. The layout > is not trivial to making this perform well. No floor planning involved. Here's the statistics when it's implemented by itself with flip-flops on all inputs and outputs. The design is a 16-input barrel shifter, with 3 select inputs giving shifts from 0 to 7 bits. The design is a fall through, as is probably most efficient for this type of barrel shifter. It's placed and routed into a small VirtexE-8, which is a speedy little part. I put a clock period constraint on it of 5ns, but it only got to 165MHz. Still, this isn't bad for a fall through 16-wide barrel shifter with no floor planning and no buffering on the control lines. If I get around to it, I'll convert the source from schematic to (readable) VHDL. The reason it's in schematic form is because I hate to deal with RLOCs in VHDL: <<< Design Information ------------------ Command Line : map -p xcv50e-8-cs144 -o map.ncd arith.ngd arith.pcf Target Device : xv50e Target Package : cs144 Target Speed : -8 Mapper Version : virtexe -- D.27 Mapped Date : Tue Dec 18 19:57:47 2001 Design Summary -------------- Number of errors: 0 Number of warnings: 1 Number of Slices: 36 out of 768 4% Number of Slices containing unrelated logic: 0 out of 36 0% Number of Slice Flip Flops: 51 out of 1,536 3% Number of 4 input LUTs: 36 out of 1,536 2% Number of bonded IOBs: 35 out of 94 37% Number of GCLKs: 1 out of 4 25% Number of GCLKIOBs: 1 out of 4 25% Total equivalent gate count for design: 696 Additional JTAG gate count for IOBs: 1,728 >>> <<< The Number of signals not completely routed for this design is: 0 The Average Connection Delay for this design is: 0.885 ns The Maximum Pin Delay is: 2.310 ns The Average Connection Delay on the 10 Worst Nets is: 1.645 ns ... -------------------------------------------------------------------------------- Constraint | Requested | Actual | Logic | | | Levels -------------------------------------------------------------------------------- * NET "CLK" PERIOD = 5 nS LOW 50.000 % | 5.000ns | 6.036ns | 4 -------------------------------------------------------------------------------- >>> <<< Constraints cover 276 paths, 0 nets, and 184 connections (92.0% coverage) Design statistics: Minimum period: 6.036ns (Maximum frequency: 165.673MHz) Analysis completed Tue Dec 18 19:58:13 2001 -------------------------------------------------------------------------------- >>> Carl -- Posted from firewall.terabeam.com [216.137.15.2] via Mailgate.ORG Server - http://www.Mailgate.ORG ###### Message-ID: <3C2027D6.F445EA75@andraka.com> From: Ray Andraka Organization: Andraka Consulting Group, Inc X-Mailer: Mozilla 4.77 [en] (WinNT; U) X-Accept-Language: en MIME-Version: 1.0 Newsgroups: comp.arch.fpga Subject: Re: Barrel shifter puts three 2->1 muxes / slice in Xilinx References: <3C1F9ACC.AFDA823@irisa.fr> <3C1FA329.76322373@xilinx.com> <3C1FB0C3.37E32F2D@andraka.com> <04af47cf81c9461ae53b109bdf127ac1.51709@mygate.mailgate.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Lines: 124 Date: Wed, 19 Dec 2001 05:37:11 GMT NNTP-Posting-Host: 24.13.238.93 X-Complaints-To: abuse@home.net X-Trace: news1.wwck1.ri.home.com 1008740231 24.13.238.93 (Tue, 18 Dec 2001 21:37:11 PST) NNTP-Posting-Date: Tue, 18 Dec 2001 21:37:11 PST Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news.ifi.unizh.ch!44051!news.imp.ch!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!newsfeed.direct.ca!look.ca!newshub2.rdc1.sfba.home.com!news.home.com!news1.wwck1.ri.home.com.POSTED!not-for-mail Xref: chonsp.franklin.ch comp.arch.fpga:12492 We don't use them in a fall through configuration very often. As a point of reference, we have one in a 160 MHz design in a VirtexE-6 that does a 0 to 15 position shift with a 2 clock latency, including rounding at the output. It is 19 bits wide at the output. It is not the critical path in the design. IIRC, there are 3 levels of conventional shifter, a register, then the last layer along with a carry chain for the round. That one is VHDL with the RLOCs in the code. We prefer VHDL for placed designs now because of the capability of the generate statement (the design I was describing is a parameterized generate so it can take arbitrary input and output widths as well as number of clocks of latency). Off-hand, I think with careful layout you could get a 3 layer fall through design using the conventional approach well above 200 MHz in an E-8. Carl Brannen wrote: > Ray: > > >2) The carry chain can also be used for a free doubler circuit. > > However, watch the timing. There exist false paths (that are > > also quite slow comparatively speaking) introduced by the > > non-standard use of the carry chain (the chain connections > > are only used to the next neighbor, not all the way up the > > chain). Timingwise, the conventional approach seems to yield > > better propagation delays in combinatorial only shifters, and > > considerably better times in fully pipelined shifters. This > > is a good trick to put in your back pocket for those times > > where the need for density outweighs the needs of the clock > > cycle. > > This is true. I have another barrrel shift design that's based on the use > of the carry chain. I'll post it to the thread later, but as a new topic > head. But the way I used the Carry chain the false path was a true one. > That is, under certain (rare?) circumstances, the carry chain may have to > propagate a signal from one end to the other. I'll post an explanation when > I put up the VHDL for that barrel shifter (later tonight, maybe). I think > this is enough cool circuits for one thread. > > > 3) I'd be interested in seeing your layout solution. The layout > > is not trivial to making this perform well. > > No floor planning involved. Here's the statistics when it's implemented by > itself with flip-flops on all inputs and outputs. The design is a 16-input > barrel shifter, with 3 select inputs giving shifts from 0 to 7 bits. The > design is a fall through, as is probably most efficient for this type of > barrel shifter. It's placed and routed into a small VirtexE-8, which is a > speedy little part. I put a clock period constraint on it of 5ns, but it only > got to 165MHz. Still, this isn't bad for a fall through 16-wide barrel > shifter with no floor planning and no buffering on the control lines. If I > get around to it, I'll convert the source from schematic to (readable) VHDL. > The reason it's in schematic form is because I hate to deal with RLOCs in > VHDL: > > <<< > Design Information > ------------------ > Command Line : map -p xcv50e-8-cs144 -o map.ncd arith.ngd arith.pcf > Target Device : xv50e > Target Package : cs144 > Target Speed : -8 > Mapper Version : virtexe -- D.27 > Mapped Date : Tue Dec 18 19:57:47 2001 > > Design Summary > -------------- > Number of errors: 0 > Number of warnings: 1 > Number of Slices: 36 out of 768 4% > Number of Slices containing > unrelated logic: 0 out of 36 0% > Number of Slice Flip Flops: 51 out of 1,536 3% > Number of 4 input LUTs: 36 out of 1,536 2% > Number of bonded IOBs: 35 out of 94 37% > Number of GCLKs: 1 out of 4 25% > Number of GCLKIOBs: 1 out of 4 25% > Total equivalent gate count for design: 696 > Additional JTAG gate count for IOBs: 1,728 > >>> > > <<< > The Number of signals not completely routed for this design is: 0 > > The Average Connection Delay for this design is: 0.885 ns > The Maximum Pin Delay is: 2.310 ns > The Average Connection Delay on the 10 Worst Nets is: 1.645 ns > ... > > -------------------------------------------------------------------------------- > Constraint | Requested | Actual | Logic > | | | Levels > > -------------------------------------------------------------------------------- > * NET "CLK" PERIOD = 5 nS LOW 50.000 % | 5.000ns | 6.036ns | 4 > > -------------------------------------------------------------------------------- > >>> > > <<< > Constraints cover 276 paths, 0 nets, and 184 connections (92.0% coverage) > > Design statistics: > Minimum period: 6.036ns (Maximum frequency: 165.673MHz) > > Analysis completed Tue Dec 18 19:58:13 2001 > > -------------------------------------------------------------------------------- > >>> > > Carl > > -- > Posted from firewall.terabeam.com [216.137.15.2] > via Mailgate.ORG Server - http://www.Mailgate.ORG -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759 ###### From: "Carl Brannen" Newsgroups: comp.arch.fpga Subject: Re: Barrel shifter puts three 2->1 muxes / slice in Xilinx Date: Wed, 19 Dec 2001 08:31:45 +0000 (UTC) Organization: Mailgate.ORG Server - http://www.Mailgate.ORG Lines: 34 Message-ID: <865212a8446d17759b69d0dd8852d212.51709@mygate.mailgate.org> References: <3C1F9ACC.AFDA823@irisa.fr> <3C1FA329.76322373@xilinx.com> <3C1FB0C3.37E32F2D@andraka.com> <04af47cf81c9461ae53b109bdf127ac1.51709@mygate.mailgate.org> <3C2027D6.F445EA75@andraka.com> NNTP-Posting-Host: firewall.terabeam.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Trace: news.mailgate.org 1008744250 13480 216.137.15.2 (Wed Dec 19 09:31:45 2001) X-Complaints-To: abuse@mailgate.org NNTP-Posting-Date: Wed, 19 Dec 2001 08:31:45 +0000 (UTC) Injector-Info: news.mailgate.org; posting-host=firewall.terabeam.com; posting-account=51709; posting-date=1008744250 User-Agent: Mailgate Web Server X-URL: http://www.Mailgate.ORG Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news.mailgate.org!web2news!firewall.terabeam.com!not-for-mail Xref: chonsp.franklin.ch comp.arch.fpga:12499 Hi Ray, No question that messing around with the carries is going to slow down a barrel shifter. It's been a very long time since I used one (okay, it was actually a "funnel shifter", but I hate that term, so I call them all barrel shifters). I just got the one that uses the carries to effectively fit a 3 to 1 mux into a LUT to place and route correctly. It uses 38 LUTs to do a two stage shift of 0 to 8 bits (i.e. a 9-bit barrel shift) on a 16-bit input. It's very efficient when you need a power of 3 shift size. You get a 9-bit shift with only 2 stages of logic instead of the usual 4. I'll post it to the thread in a minute. The reason I made it two stage was simply to prevent the synthesizer from messing with my logic. It's slow enough (due to the full length carry) that it would make more engineering sense as a space saving circuit rather than a highly pipelined design. But I'm doing this for fun, so what the heck. The last rough spot was getting the synthesizer to recognize that a LUT4 wasn't interfering with a MULT_AND. I had to instantiate the LUT4s. I haven't figured out how to apply an attribute to a generated component (i.e. like your RLOC usage on generated components). I'm assuming here that "generated" means the use of the "generate" command in VHDL. The basic problem is that I haven't figured out how to properly address the components. So I went ahead and instantiated them individually. If you have a way around that, I'd appreciate the secret. Carl -- Posted from firewall.terabeam.com [216.137.15.2] via Mailgate.ORG Server - http://www.Mailgate.ORG ###### From: "Stephen Melnikoff" Newsgroups: comp.arch.fpga Subject: Re: Barrel shifter puts three 2->1 muxes / slice in Xilinx Date: Wed, 19 Dec 2001 14:18:24 -0000 Organization: The University of Birmingham news server Lines: 16 Message-ID: <9vq7ts$ggv$1@usenet.bham.ac.uk> References: NNTP-Posting-Host: eee553.bham.ac.uk X-Trace: usenet.bham.ac.uk 1008771836 16927 147.188.145.247 (19 Dec 2001 14:23:56 GMT) X-Complaints-To: usenet@usenet.bham.ac.uk NNTP-Posting-Date: 19 Dec 2001 14:23:56 GMT X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2600.0000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!newscore.univie.ac.at!194.25.134.126.MISMATCH!newsfeed01.sul.t-online.de!newsfeed00.sul.t-online.de!t-online.de!news-lei1.dfn.de!news-fra1.dfn.de!news-koe1.dfn.de!lnewspeer00.lnd.ops.eu.uu.net!emea.uu.net!server1.netnews.ja.net!usenet.bham.ac.uk!not-for-mail Xref: chonsp.franklin.ch comp.arch.fpga:12618 > The problem with doing it is that it's hard to get the output of the "F" LUT > out of the slice. But it can be done by brining it out the CARRY-OUT. Perhaps I'm missing something, but why can't you send the output of the F-LUT to output X, and the output of the F5-MUX to output F5? Stephen Melnikoff. -- Stephen Melnikoff - s.j.melnikoff@iee.org Electronic, Electrical and Computer Engineering University of Birmingham, Birmingham, UK ###### From: gah@ugcs.caltech.edu (glen herrmannsfeldt) Newsgroups: comp.arch.fpga Subject: Re: Barrel shifter puts three 2->1 muxes / slice in Xilinx Date: 19 Dec 2001 23:27:08 GMT Organization: California Institute of Technology, Pasadena Lines: 28 Message-ID: <9vr7oc$rrn@gap.cco.caltech.edu> References: <3C1F9ACC.AFDA823@irisa.fr> <3C1FA329.76322373@xilinx.com> <3C1FB0C3.37E32F2D@andraka.com> <04af47cf81c9461ae53b109bdf127ac1.51709@mygate.mailgate.org> <3C2027D6.F445EA75@andraka.com> <865212a8446d17759b69d0dd8852d212.51709@mygate.mailgate.org> NNTP-Posting-Host: zloty.ugcs.caltech.edu X-Newsreader: NN version 6.5.0 #1 (NOV) Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news.mailgate.org!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!newsfeed.stanford.edu!logbridge.uoregon.edu!nntp-server.caltech.edu!gah Xref: chonsp.franklin.ch comp.arch.fpga:12620 "Carl Brannen" writes: >No question that messing around with the carries is going to slow >down a barrel shifter. It's been a very long time since I used one > (okay, it was actually a "funnel shifter", but I hate that term, >so I call them all barrel shifters). I just got the one that uses >the carries to effectively fit a 3 to 1 mux into a LUT to place and >route correctly. It uses 38 LUTs to do a two stage >shift of 0 to 8 bits (i.e. a 9-bit barrel shift) on a 16-bit input. >It's very efficient when you need a power of 3 shift size. >You get a 9-bit shift with only 2 stages of logic instead >of the usual 4. I'll post it to the thread in a minute. It would seem that you could use it for an 8 bit shift, as part of a larger binary barrel shifter. Two would do 64, for example. Otherwise, one use for barrel shifters is floating point normalization. IBM used a base 16, instead of base 2, floating point in S/360, S/370, through the beginning of ESA/390. I believe one reason for base 16 floating point is the reduced need for shift logic. I think that for floating point in FPGA's, where barrel shifting is relatively more expensive, base 16 should be considered. -- glen ###### From: Frederic Rivoallon Newsgroups: comp.arch.fpga Subject: Re: Barrel shifter puts three 2->1 muxes / slice in Xilinx Date: Thu, 20 Dec 2001 11:29:15 -0800 Organization: Xilinx, Inc. Lines: 18 Message-ID: <3C223C0B.C9E220CF@xilinx.com> References: <3C1F9ACC.AFDA823@irisa.fr> <3C1FA329.76322373@xilinx.com> <3C1FB0C3.37E32F2D@andraka.com> <04af47cf81c9461ae53b109bdf127ac1.51709@mygate.mailgate.org> <3C2027D6.F445EA75@andraka.com> <865212a8446d17759b69d0dd8852d212.51709@mygate.mailgate.org> NNTP-Posting-Host: 149.199.9.135 Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Mailer: Mozilla 4.74 [en]C-CCK-MCD (WinNT; U) X-Accept-Language: en To: Carl Brannen Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news.mailgate.org!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!cpk-news-hub1.bbnplanet.com!cambridge1-snf1.gtei.net!news.gtei.net!bos-service1.ext.raytheon.com!attla1!ip.att.net!newsgate.xilinx.com!cliff.xsj.xilinx.com!not-for-mail Xref: chonsp.franklin.ch comp.arch.fpga:12575 Carl Brannen wrote: > .... I haven't figured out how to apply an attribute to a generated component > (i.e. like your > RLOC usage on generated components). I'm assuming here that "generated" means > the use of the "generate" command in VHDL. The basic problem is that I haven't > figured out how to properly address the components. So I went ahead and > instantiated them individually. > > If you have a way around that, I'd appreciate the secret. > Check this link (How to Attach Attributes Inside of Generate?): http://tech-www.informatik.uni-hamburg.de/vhdl/doc/faq/FAQ1.html#attributes Frédéric ###### From: "Carl Brannen" Newsgroups: comp.arch.fpga Subject: Re: Barrel shifter puts three 2->1 muxes / slice in Xilinx Date: Thu, 20 Dec 2001 23:11:24 +0000 (UTC) Organization: Mailgate.ORG Server - http://www.Mailgate.ORG Lines: 17 Message-ID: References: <3C1F9ACC.AFDA823@irisa.fr> <3C1FA329.76322373@xilinx.com> <3C1FB0C3.37E32F2D@andraka.com> <04af47cf81c9461ae53b109bdf127ac1.51709@mygate.mailgate.org> <3C2027D6.F445EA75@andraka.com> <865212a8446d17759b69d0dd8852d212.51709@mygate.mailgate.org> <3C223C0B.C9E220CF@xilinx.com> NNTP-Posting-Host: firewall.terabeam.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Trace: news.mailgate.org 1008884794 14787 216.137.15.2 (Fri Dec 21 00:11:24 2001) X-Complaints-To: abuse@mailgate.org NNTP-Posting-Date: Thu, 20 Dec 2001 23:11:24 +0000 (UTC) Injector-Info: news.mailgate.org; posting-host=firewall.terabeam.com; posting-account=51709; posting-date=1008884794 User-Agent: Mailgate Web Server X-URL: http://www.Mailgate.ORG Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news.ifi.unizh.ch!3330945!news.imp.ch!fr.clara.net!heighliner.fr.clara.net!proxad.net!news-hub.cableinet.net!blueyonder!btnet-peer!btnet-peer0!btnet!news.mailgate.org!web2news!firewall.terabeam.com!not-for-mail Xref: chonsp.franklin.ch comp.arch.fpga:12615 Frédéric, that link is much appreciated, and though I haven't tried it yet, I believ it will do the trick. Carl > > > Check this link (How to Attach Attributes Inside of Generate?): > http://tech-www.informatik.uni-hamburg.de/vhdl/doc/faq/FAQ1.html#attributes > > Frédéric -- Posted from firewall.terabeam.com [216.137.15.2] via Mailgate.ORG Server - http://www.Mailgate.ORG ###### Message-ID: <3C227282.44E19DC4@andraka.com> From: Ray Andraka Organization: Andraka Consulting Group, Inc X-Mailer: Mozilla 4.77 [en] (WinNT; U) X-Accept-Language: en MIME-Version: 1.0 Newsgroups: comp.arch.fpga Subject: Re: Barrel shifter puts three 2->1 muxes / slice in Xilinx References: <3C1F9ACC.AFDA823@irisa.fr> <3C1FA329.76322373@xilinx.com> <3C1FB0C3.37E32F2D@andraka.com> <04af47cf81c9461ae53b109bdf127ac1.51709@mygate.mailgate.org> <3C2027D6.F445EA75@andraka.com> <865212a8446d17759b69d0dd8852d212.51709@mygate.mailgate.org> <3C223C0B.C9E220CF@xilinx.com> Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Lines: 56 Date: Thu, 20 Dec 2001 23:20:15 GMT NNTP-Posting-Host: 24.13.238.93 X-Complaints-To: abuse@home.net X-Trace: news1.wwck1.ri.home.com 1008890415 24.13.238.93 (Thu, 20 Dec 2001 15:20:15 PST) NNTP-Posting-Date: Thu, 20 Dec 2001 15:20:15 PST Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news.ifi.unizh.ch!126361!news.imp.ch!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!netnews.com!newshub2.rdc1.sfba.home.com!news.home.com!news1.wwck1.ri.home.com.POSTED!not-for-mail Xref: chonsp.franklin.ch comp.arch.fpga:12578 Since this seems to come up fairly frequently, here is a simple example of RLOCs applied as an attribute inside the generate. The itoa function is a homebrew function that converts an integer into an ascii string. If your synthesizer recognizes 'image, you could use that instead. The attributes have to be assigned in the same level of code as the labels they are being assigned to are due to visibility. In otherwords, components inside a generate are not visible outside the generate, so the attributes have to be in the generate's declaration section. LEN:for i in 0 to bits-1 generate constant row :natural:=((width-1)/2)-(i/2); constant column:natural:=0; constant slice:natural:=0; constant rloc_str : string := "R" & itoa(row) & "C" & itoa(column) & ".S" & itoa(slice); attribute RLOC of U1: label is rloc_str; begin U1: FDE port map ( Q => dd(j), D => ff_d, C => clk, CE =>lcl_en(en_idx)); end generate LEN; Frederic Rivoallon wrote: > Carl Brannen wrote: > > > .... I haven't figured out how to apply an attribute to a generated component > > (i.e. like your > > RLOC usage on generated components). I'm assuming here that "generated" means > > the use of the "generate" command in VHDL. The basic problem is that I haven't > > figured out how to properly address the components. So I went ahead and > > instantiated them individually. > > > > If you have a way around that, I'd appreciate the secret. > > > > Check this link (How to Attach Attributes Inside of Generate?): > http://tech-www.informatik.uni-hamburg.de/vhdl/doc/faq/FAQ1.html#attributes > > Frédéric -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759 ###### From: "Carl Brannen" Newsgroups: comp.arch.fpga Subject: Re: Barrel shifter puts three 2->1 muxes / slice in Xilinx Date: Fri, 21 Dec 2001 23:38:44 +0000 (UTC) Organization: Mailgate.ORG Server - http://www.Mailgate.ORG Lines: 17 Message-ID: References: <9vq7ts$ggv$1@usenet.bham.ac.uk> NNTP-Posting-Host: firewall.terabeam.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Trace: news.mailgate.org 1008971919 29799 216.137.15.2 (Sat Dec 22 00:38:44 2001) X-Complaints-To: abuse@mailgate.org NNTP-Posting-Date: Fri, 21 Dec 2001 23:38:44 +0000 (UTC) Injector-Info: news.mailgate.org; posting-host=firewall.terabeam.com; posting-account=51709; posting-date=1008971919 User-Agent: Mailgate Web Server X-URL: http://www.Mailgate.ORG Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news.mailgate.org!web2news!firewall.terabeam.com!not-for-mail Xref: chonsp.franklin.ch comp.arch.fpga:12710 Hi Stephen, > Perhaps I'm missing something, but why can't you send the output of the > F-LUT to output X, and the output of the F5-MUX to output F5? As far as I can see, you can do that, and then use the same algorithm I gave in order to get the G-LUT output out of the slice with the F6-MUX. Unfortunately, my brain is just a few neurons short of quickly figuring out how efficient it would be. My guess is that it is an improvement... Carl -- Posted from firewall.terabeam.com [216.137.15.2] via Mailgate.ORG Server - http://www.Mailgate.ORG ###### Message-ID: <3C2411A5.28EB3EE5@andraka.com> From: Ray Andraka Organization: Andraka Consulting Group, Inc X-Mailer: Mozilla 4.77 [en] (WinNT; U) X-Accept-Language: en MIME-Version: 1.0 Newsgroups: comp.arch.fpga Subject: Re: Barrel shifter puts three 2->1 muxes / slice in Xilinx References: <9vq7ts$ggv$1@usenet.bham.ac.uk> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Lines: 33 Date: Sat, 22 Dec 2001 04:51:22 GMT NNTP-Posting-Host: 24.13.238.93 X-Complaints-To: abuse@home.net X-Trace: news1.wwck1.ri.home.com 1008996682 24.13.238.93 (Fri, 21 Dec 2001 20:51:22 PST) NNTP-Posting-Date: Fri, 21 Dec 2001 20:51:22 PST Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news-ge.switch.ch!newsfeed00.sul.t-online.de!t-online.de!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!wn4feed!worldnet.att.net!24.0.0.38!newshub2.rdc1.sfba.home.com!news.home.com!news1.wwck1.ri.home.com.POSTED!not-for-mail Xref: chonsp.franklin.ch comp.arch.fpga:12707 The F5 output only goes to the F6 mux in the neighboring slice, nowhere else. At the F6 you run into a similar problem, because it has to get out somewhere, and it's other input has to be sourced by the F5 in that slice, which in turn is sourced by the LUTs. Stephen Melnikoff wrote: > > The problem with doing it is that it's hard to get the output of the "F" > LUT > > out of the slice. But it can be done by brining it out the CARRY-OUT. > > Perhaps I'm missing something, but why can't you send the output of the > F-LUT to output X, and the output of the F5-MUX to output F5? > > Stephen Melnikoff. > > -- > Stephen Melnikoff - s.j.melnikoff@iee.org > Electronic, Electrical and Computer Engineering > University of Birmingham, Birmingham, UK -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759