From: "Carl Brannen" Newsgroups: comp.arch.fpga Subject: Efficient new multiplier for Spartan2, Virtex &c. Date: Wed, 19 Dec 2001 18:20:06 +0000 (UTC) Organization: Mailgate.ORG Server - http://www.Mailgate.ORG Lines: 174 Message-ID: <4172938cdfe1a23d2d483ee9c9d0460e.51709@mygate.mailgate.org> NNTP-Posting-Host: firewall.terabeam.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Trace: news.mailgate.org 1008777918 24653 216.137.15.2 (Wed Dec 19 19:20:06 2001) X-Complaints-To: abuse@mailgate.org NNTP-Posting-Date: Wed, 19 Dec 2001 18:20:06 +0000 (UTC) Injector-Info: news.mailgate.org; posting-host=firewall.terabeam.com; posting-account=51709; posting-date=1008777918 User-Agent: Mailgate Web Server X-URL: http://www.Mailgate.ORG Path: chonsp.franklin.ch!pfaff.ethz.ch!news-zh.switch.ch!news.mailgate.org!web2news!firewall.terabeam.com!not-for-mail Xref: chonsp.franklin.ch comp.arch.fpga:12510 I haven't seen this algorithm published. If it is original, I'm throwing it into the public domain. -- Efficient fall through multiplier in Xilinx Spartan2, Virtex, -- VirtexE. (Will work in Virtex2, but they have dedicated -- multipliers so they probably don't need this.) -- -- This fall through multiplier gets 3 bits per initial adder -- rather than the usual 2 bits. This is accomplished by taking -- advantage of under utilized adders in a standard multiplier. -- -- This is an efficient multiplier when the multiplier has -- 3n - 1 bits, with n>1. Where it really rocks is when the -- the multiplier has (3n-1)*2^m bits. -- -- This is in contrast to the usual Xilinx Virtex multiplier -- which is relatively efficient when the multiplier has -- 2*2^m bits. -- -- For large enough multiplies, this algorithm gets more and -- more efficient, compared to the usual Xilinx multiplier. -- -- While a standard multiplier uses the MULT_AND logic well, the -- stages that add up the partial product results are simple adders -- built from LUT2s. Any time you see a big array of LUTs used with -- less than all four inputs needed you have to wonder if there's -- a more efficient way of packing the logic in. -- -- A note on the usual Xilinx multiplier. (Those skilled in the art -- should skip this section.) -- -- The usual multiplier uses two bits per partial product. The least -- significant partial product produces one of {0M, 1M, 2M, 3M}, while the -- next least significant produces one of {0M, 4M, 8M, 12M}. Adding -- together these two results produces any multiple of M from 0M to 15M: -- -- 0M + 0M = 0M -- 0M + 1M = 1M -- 0M + 2M = 2M -- 0M + 3M = 3M -- 4M + 0M = 4M -- 4M + 1M = 5M -- 4M + 2M = 6M -- 4M + 3M = 7M -- ... -- 12M + 2M = 14M -- 12M + 3M = 15M -- -- -- Instead of keeping all my numbers as positive (or 2's complement -- negative) numbers, I save bits if I allow negative numbers and -- keep a single bit that indicates that the result is to be interpreted -- as a negative number. This is how Seymour Cray did his arithmetic, -- but I don't know if he used this internally to his multipliers. -- But if I assume that my single column of slices is only going to -- give me one of four possible multiples of M, I have a problem. If -- I choose the four values to be {0M, 1M, 2M, 3M}, then I only get -- seven values when I negate them as -0M = 0M. But for a 3-bit -- coded multiplier I'm going to need 2^3 = 8 values from each slice. -- If I use {1M, 2M, 3M, 4M}, then I miss zero. -- -- The solution is to use two sign bits, one for positive numbers, the -- other for negative. Let M = 5, so the nine possible values are as -- follows: (Note that I only need 8 of these nine values.) -- -- P N Mult | Value -- - - ---- + ----- -- 0 1 4 | -20 (i.e. -4*M) -- 0 1 3 | -15 -- 0 1 2 | -10 -- 0 1 1 | - 5 -- 0 0 X | 0 -- 1 0 1 | 5 -- 1 0 2 | 10 -- 1 0 3 | 15 -- 1 0 4 | 20 (i.e. 4*M) -- -- This would be exactly what the doctor ordered if the multiplier -- were in a base slightly different from the octal. With octal, the -- eight values that a digit can take are the familiar {0,1, ... 7}. -- With this unusual base, the 8 values that a digit can take are -- instead {-4,-3,-2,-1,0,1,2,3}. (I choose to keep these rather than -- -3 through 4 because it saves a LUT somewhere.) -- -- So I need a base conversion between base 8 and base, well, I'll -- call it base 8-4. Here's how numbers are interpreted in the -- two bases: -- -- Base 8: A = (A0 ) + (A1 )*8 + (A2 )*8^2 + ... -- Base 8-4: B = (B0-4) + (B1-4)*8 + (B2-4)*8^2 + ... -- -- From this it is obvious that to convert the number "A" in base 8 -- to base 8-4, I need merely add the octal constant o444444... to "A". -- This perfectly converts it to the corresponding (i.e. carrying the -- same numerical value) number in base 8-4. -- -- This conversion is very convenient when the multiplier has a lot of -- bits, but it isn't needed for relatively short multipliers. In -- particular, a multiplier of n x 5 would do well to avoid performing -- the base conversion explicitly. -- -- -- After performing the base conversion, I take each digit from B -- (where B = A + o44...44. = A + "100100100...100100") and use -- it to create a single partial product. For an n x (3m-1) multiply, -- I'll end up with m partial products. -- -- I can't just add up the partial products because they're not in -- the usual format for 2's complement arithmetic. I'll have to add -- extra logic to the adder stages in order to handle the signed -- values being added. It turns out that there is exactly enough -- freedom in a Xilinx slice in order to handle these explicitly -- signed numbers. -- -- With two mode bits, I can choose 4 functions in a Xilinx arithmetic -- slice. The table to add two signed numbers (each with "N" and "P" -- bits, and each with a partial sum in the set 1M to 4M) is fairly -- complicated. Let "S" be the higher precision partial product, and -- "T" be the lower precision number. -- -- Rather than be confusing, I'll denote the "N" bit of "S" by "S.N", -- same with the "P" bit, and I'll denote the unsigned vector part -- of "S" by "S.V". Same with "T". Then the function to add "S" and -- "T" is as follows: (Note that since "S" is higher in signficance -- than "T", it follows that "S.V" is larger, as an unsigned number, -- than "T.V", except if "S" is zero, in which case "S.V" is a don't -- care. For this reason, all the subtractions in the following -- table result in unsigned integers.) -- -- "S" "T" "S+T" -- --------- --------- ------------------ -- S P N V T P N V S+T P N V -- -- - - - -- - - - ------ - - --- -- -A 0 1 A -B 0 1 B -(A+B) 0 1 A+B -- -A 0 1 A 0 0 0 X -(A ) 0 1 A -- -A 0 1 A B 1 0 B -(A-B) 0 1 A-B -- 0 0 0 A -B 0 1 B -( B) 0 1 B -- 0 0 0 A 0 0 0 X 0 0 0 X -- 0 0 0 A B 1 0 B ( B) 1 0 B -- A 1 0 A -B 0 1 B (A-B) 1 0 A-B -- A 1 0 A 0 0 0 X (A ) 1 0 A -- A 1 0 A B 1 0 B (A_B) 1 0 A+B -- -- From this, it is clear that "(S+T).N" and "(S+T).P" are -- simple (4-LUT) functions of "S.N", "S.P", "T.N" and "T.P". -- It's also clear that "(S+T).V" is computed by one of the -- four functions {A+B, A-B, A, B}. It turns out that these -- four arithmetic functions exactly fit in a single slice. -- -- It's all very well that I keep the internal partial sums -- in this odd notation, but it won't do to deliver that to -- the customer. So how do I compute the final result? -- -- First of all, the final result can't have the "N" bit set, -- because this is an unsigned multiply. If it has the "P" bit -- set, then the correct product shows up on "(S+T).V" and I'm -- done. The only hairy case is when the "P" bit is low. In -- that case, the final result will be supposed to be zero, but -- "(S+T).V" will be an "X". For this reason, I have to connect -- up the final "P" result to the synchronous reset pin of the -- final register in such a way that when "P" is zero, the -- final result register is held to zero. -- -- This comment has gone on long enough that I'm including it -- as a separate note rather than with the VHDL. I'll add VHDL -- code for a sample multipliers as replies to this note, but -- these aren't easy to build, so give me some time. Carl -- Posted from firewall.terabeam.com [216.137.15.2] via Mailgate.ORG Server - http://www.Mailgate.ORG