НАУЧНЫЕ РАБОТЫ

Реферат: Physical Methods of Speed-Independent Module Design

As to choosing value of rb it must be done with regard to maximal voltage drop Vmax .

If V>750mV, the diode D1 is in active mode and while rb <<R1 the condition Ir <<Id is true. So, in the large current region IId and Equation (2) determines an almost linear dependence between I and V. For instance, if the maximal voltage drop Vmax =900mV and maximal input current Imax=2mA, then in accordance with the Equation (2) rb 100. Typical element values for the OVD circuit with Vth =400mV are given in Table 1.

The turn-on ton and turn-off toff delays of the OVD circuit depend on the OVD itself and the CMOS CL as well. (Switching the OVD output from low to high voltage is called "turning-on" and reverse switching is called "turning-off".)

Consider a piece of CMOS CL and its interaction with OVD circuit (Fig.11). The piece is an SPP including N logic gates. Each gate is shown symbolically as a connection of PMOS and NMOS networks. All the capacitances affecting ton and toff can be brought down to three components:

(i) CLi is the load capacitance of the i-th gate;

(ii) Cpsi is the power supply bus capacitance associated with the i-th gate;

(iii) Cin is the input capacitance of the OVD circuit.

Let pi is a probability of the i-th gate being in the state of high output potential. In this state the capacitance CLi is connected with power supply bus through the low channel resistance of turned-on transistors in PMOS network of the i-th gate. Then equivalent capacitance Ceq connected to the OVD circuit input equals

(7)

where N is a number of gates in the considered SPP. Here the resistance of conducting PMOS network is assumed to be negligible.

Equation (7) is also true for CL including several SPPs. In that case summing must be carried out for all the gates belonging to CL.

Simulation shows that ton and toff are proportional to the OVD time constant =R1Ceq. It was also obtained that when N>20, the component under the sign of summation in Equation (7) can be much larger than the component Cin. Due to voltage drop V the effective power supply voltage is reduced and CL performance is decreased by about 35 percent [7].

In order to make SIM operating faster special attention must be paid to reducing the capacitance introduced by CL.

4.3 Speed-independent address bus

The simplest case of CL is a scheme degenerated into a set of wires called a multi-bit bus. Let us develop the OVD circuit for such a CL.

Multi-bit bus consists of several lines. Each line can be considered as a medium for signal propagating from one end of the chip to another. Delay of signal propagation through a line depends on several factors:

(a) output impedance and symmetry of driver circuit;

(b) initial state of the line: if driver is symmetrical, line switching from high to low voltage lasts shorter than reverse switching;

(c) electrical properties of the line as a signal propagation medium (resistance of conducting layer and capacitances between the line and other wires next to it);

(d) length of the line;

(e) input impedance and sensitivity of receiving circuit.

Since different lines of the bus operate in different conditions (a)-(e), signal propagation delays are different, too. From the standpoint of environment the bus behaves like any other more complicated CL.

Asynchronous RAM designers use a bus transition detector since 1980s [13-15]. Such a detector is usually based on double-rail address coding and two series connected transistors for each address bit [15]. One of the transistors receives the true address signal and the other receives the complementary address signal of the particular address bit. For any steady state condition one of the transistors will be turned on and one will be turned off. There will be a finite rise and fall time during a transition of the address bit. There is a short time during which both transistors are conducting. The establishment of the conductive path provides the detection of the address transition. In the first asynchronous RAMs the output signal of the transition detector is used for bit line precharging and for enabling/disabling sense amplifiers and peripheral circuitry.

Self-timed RAM announced in 1983 [14] used transition detectors not for address transition only but also for detecting read/write completion and address/bit line precharge completion as well.

The CMOS transition detector was invented in 1986 [15]. This circuit is also based on double-rail coding and uses a pair of series-connected NMOS transistors (Fig.12). The scheme for n-bit bus control contains n line transition detectors (LTDs) and n AND-gates. Outputs of AND-gates are united in node M forming wired OR. The output inverter serves as a pulse shaper. Capacitors C1 and C2 are intended to prolong rise time of the LTD output signal (true and complementary). This is necessary for reliable detection.

The main drawback of the circuit is speed dependence. One can see that if true and complementary address bit signal have different propagation delays, the conducting path via NMOS transistors will never be formed.

Using the OVD circuit proposed in Section 4.2 as LTD we can avoid this drawback.

Note that address transmission through the address bus is unidirectional. So to detect completion of bus transition it is enough to recognize the bus state at the destination end. For this purpose we modify CL to consist of n lines. The modification means introducing n LTDs, each actually a CMOS inverter chain. Each chain contains two inverters loaded with a capacitance (Fig.13). Input of each LTD is connected with corresponding line of the bus at the destination end. Power supply pads of all LTDs are connected to the current input of the same OVD circuit.

The parameters of the input current signal for the OVD circuit are varied by

(i) value of capacitances C1 and C2 ;

(ii) dimensions of MOS transistors M1 -M4 .

Since all transitions in CL are of the same duration and can be lengthened to be outlast the OVD turning-on time, we simplify the interface circuitry by disallowing the asymmetrical delay.

Due to short duration of normal transition in this CL we must take into account the integral nature of the sensitivity of the OVD circuit. OVD sensitivity depends on both amplitude and width of input current pulse. Simulated operation region of the OVD circuit for current pulses shorter than 30ns is shown in Fig.14. It is obvious that in this case the threshold of the OVD circuit must be determined by threshold charge Qth value. The OVD input charge Q equals to where I is OVD input current, t is a moment of time when transition occurs, w is a width of input current pulse. Turning-on condition for the OVD circuit is Q=Qth.

When the LTD circuit shown in Fig.13 is used, the charge value Q is determined by either C1 or C2. Namely, if the line goes from low to high voltage, Q=VC2. If the line goes in the reverse direction then where V is charging/discharging voltage, approximately equal to the effective power supply voltage: VVdd -V. Here Vdd is OVD power supply voltage and V is CVC voltage drop.

The OVD circuit with typical parameters (See Table 1) has a threshold charge value Qth =4.010-12 C. When C1 =C2 =CL , the minimal value of CL providing OVD capacity for operation is about 1.010-12 F.

Influence of transistors M1 -M4 dimensions on LTD delay d is determined by approximation [17]:

where ~ is a sign of proportionality, Gn and Gp are the conductances of NMOS and PMOS transistors respectively (CL =C1 =C2.)

Since and where W and L are width and length of transistor channels of the corresponding conduction type, the LTD delay d is proportional to .

It has been obtained that for , , CL=1.0pF and Vdd-V=5.0V the LTD delay d=7.6ns.

When LTD works jointly with the OVD in the speed-independent bus, the real value of the LTD delay will increase by 30-40 percent due to OVD's R1 effect on the effective power supply voltage.

To determine the appropriate value of R1 in the OVD circuit we must know threshold input current Ith corresponding to threshold voltage drop Vth recommended to be equal to 400mV.

Average input current Iav in transient state of one line is determined by the expression Iav =CLv where v is the average rate of increase in the output signal for an inverter included in LTD. For typical values v=1.0109 Volts per second and CL =1.0pF, Iav =1.0mA. Accepting Ith =0.4mA and Imax=2.0mA we obtain R1=1k and rb=100.

Simulation has shown that in this case OVD turning-on delay can be approximated by an empirical expression:

ton[ns]=8.1+0.1n

where n is the address bus bit capacity. Total delay of recognizing address transition ttot =dg+ton where g is a coefficient of the LTD delay increase due to reducing power supply voltage. As we showed above g1.35. It can be seen that if n=32, ttot=21.6ns.

4.4 Speed-independent adder

The circuit we use in this Section as a CL was a touch-stone for many speed-independent circuit designers for about four decades. We mean a ripple carry adder (RCA) which is actually a chain of one-bit full adders (Fig.14).

Each full adder calculates two Boolean functions: sum si=aibici and output carry ci+1=aibi+bici+aici where ai, bi are summands, ci is input carry and stands for XOR operation.

In 1955 Gilchrist et al. proposed speed-independent RCA with carry completion signal [18]. In 1960s that circuit was carefully analyzed and improved [19-21]. In 1980 Seitz used RCA for illustrating his concept of equipotential region and his approach to self-timed system design [4].

Now we use RCA as a CL for illustrating our approach to SIM design.

As it was shown in Section 4.2 the turn-on and turn-off delays of the OVD circuit are proportional to the equivalent capacitance Ceq associated with OVD circuit input. Capacitance Ceq depends linearly on a number of gates N in CMOS CL. To speed up a SIM it is necessary to reduce a number N. This can be reached by structural decomposition CMOS CL into subcircuits CL1, CL2, etc. Each subcircuit CLi is connected to its own detecting circuit OVDi or directly to the power supply if this subcircuit transition does not affect the transition duration in CL as a whole. Each detecting circuit OVDi generates its own OV signal which is combined with other OVDs' output signals via a multi-input OR (NOR) element. The output signal of that element serves as OV signal of the CMOS CL.

Multi-bit RCA computation time is determined by length of maximal activated carry chain. A lot of papers were devoted to analysis of carry generation and carry propagation in RCA [19-21], many of them contained their own methods for estimation or calculation of average maximal activated carry chain. We do not intend to add another one.

Let us have a look inside RCA. As it was mentioned above RCA consists of one-bit full adders and each full adder consists of two parts: forming sum si part and forming carry ci+1 part (Fig.16).

In multi-bit RCA all forming sum parts do not interact with each other and do not affect on transition duration in RCA. Each forming carry ci+1 part receives ci signal from preceding forming carry part and sends ci+1 signal to consequent one.

To decompose RCA we use three heuristic tricks:

(i) All forming sum parts we connect directly to power supply.

(ii) We divide each forming carry part into three subcircuits denoted in Fig.16 by numbers 1,2 and 3. All subcircuits 1 we connect directly to power supply because they do not contain input ci and so do not contain carry propagation path.

(iii) All subcircuits 2 we connect to OVD1 and all subcircuits 3 we connect to OVD2. Outputs of OVD1 and OVD2 are connected to two-input NOR-gate forming RCA OV signal in positive logic manner (Fig.17).

OVD1 and OVD2 input currents I1 and I2 curves for 6-bit RCA and longest transition duration are shown in Fig.18.

Accepting Vth1,2=400mV we calculated the OVD circuits parameters. It was obtained R11=5k, Ith1=0.08mA, R12=3k, Ith2=0.13mA. OVD1 and OVD2 delay dependencies on a number of bits in RCA are shown in Fig.19.

4.5 Comparison of SIMs with synchronous counterparts

Transition duration in CL is a random variable. Probability of transition with duration D is determined by implemented Boolean function and distribution of input logical combinations. Domain of possible values for variable D occupies the interval [0;Dmax]. Here Dmax is a length of critical path in CL.

Let is a mathematical expectation of transition duration in CL where Di is a length of i-th SPP in CL, pi is a probability of i-th path being the longest activated SPP.

When CL works in the synchronous mode, the cycle duration Ts is chosen with regard to maximal transition duration Dmax. Certain margin must be added to Dmax to provide reliable operation of CL in the case of CL parameter variations: Ts =kDmax where k is a margin coefficient.

In SIM cycle duration is a random variable with expectation Tsi = gDme+toff+tif where g is a coefficient of CL delay increasing due to reducing power supply voltage, toff is turn-off delay of the OVD circuit, tif is an interface circuitry delay.

We determine efficiency E for speed-independent mode of CL operation as relative increase of SIM performance in comparison to its synchronous counterpart:.

Generally, speed-independent mode is more efficient than synchronous one if Ts >Tsi or, in other words, .

In the case of RCA where tc is a delay of carry forming part, n is a number of full adders in RCA.

It has been shown [19] that in n-bit RCA Dme tclog2(5n/4). Then, in the case of speed-independent operation Tsi=gtclog2(5n/4)+toff+tif.

We have obtained dependencies of Ts , Tsi on a number of bits in RCA that are shown in Fig.20. As it can be seen, speed-independent operation of RCA is more efficient while n>8.

5.Conclusion

6.Acknowledgement

I would like to thank Igor Shagurin and Vlad Tsylyov of the Moscow Physical Engineering Institute for helpful discussions of this work. I am also grateful to Chris Jesshope of University of Surrey and Mark Josephs of Oxford University who kindly provided the latest material on their research in the area of delay-insensitive circuit design.

References

[1] Miller, R.E., Switching theory (Wiley, New York, 1965), vol.2, Chapter 10.

[2] Unger, S.H., Asynchronous Sequential Switching Circuits (Wiley, New York, 1969).

[3] Armstrong, D.B., A.D. Friedman, and P.R. Menon, Design of Asynchronous Circuits Assuming Unbounded Gate Delays, IEEE Trans.on Computers C-18 (12) (1969) 1110-1120.

[4] Seitz, C.L., System timing, in: C.A. Mead and L.A. Conway, eds., Introduction to VLSI Systems (Addison-Wesley, New York, 1980), Chapter 7.

[5] Izosimov, O.A., I.I. Shagurin, and V.V. Tsylyov, Physical approach to CMOS module self-timing, Electronics Letters 26 (22) (1990) 1835-1836.

[6] Veendrick, H.J.M., Short-circuit dissipation of static CMOS circuit and its impact on the design of buffer circuits, IEEE J. Solid-State Circuits SC-19 (4) (1984) 468-473.

[7] Chappell, B.A, T.I. Chappell, S.E. Schuster, H.M. Segmuller, J.W. Allan, R.L. Franch, and P.J. Restle, Fast CMOS ECL receivers with 100-mV worst-case sensitivity, IEEE J. Solid-State Circuits SC-23 (1) (1988) 59-67.

[8] Chu, S.T., J. Dikken, C.D. Hartgring, F.J. List, J.G. Raemaekers, S.A. Bell, B. Walsh, and R.H.W. Salters, A 25-ns Low-Power Full-CMOS 1-Mbit (128K8) SRAM, IEEE J. Solid-State Circuits SC-23 (5) (1988) 1078-1084.

[9] Frank, E.H., and R.F. Sproull, A Self-Timed Static RAM, in: Proc. Third Caltech VLSI Conference (Springer-Verlag, Berlin, 1983) pp.275-285.

[10] Donoghue, W.J., and G.E. Noufer, Circuit for address transition detection, US Patent 4563599, 1986.

[11] Huang, J.S.T., and J.W. Schrankler, Switching characteristics of scaled CMOS circuits at 77K, IEEE Trans. on Electron Devices ED-34 (1) (1987) 101-106.

[12] Gilchrist, B., J.H. Pomerene, and S.Y. Wong, Fast Carry Logic for Digital Computers, IRE Trans. on Electronic Computers EC-4 (4) (1955) 133-136.

[13] Hendrickson, H.C., Fast High-Accuracy Binary Parallel Addition, IRE Trans. on Electronic Computers EC-9 (4) (1960) 465-469.

[14] Majerski, S., and M. Wiweger, NOR-Gate Binary Adder with Carry Completion Detection, IEEE Trans. on Electronic Computers EC-16 (1) (1967) 90-92.

[15] Reitwiesner, G.W., The determination of carry propagation length for binary addition, IRE Trans. on Electronic Computers EC-9 (1) (1960) 35-38.

Appendix

SPICE2G.6: MOSFET model parameters

				VALUE
	Name	Parameter	Units	PMOS	NMOS
1	level	model index	-	3	3
2	VTO	ZERO-BIAS THRESHOLD VOLTAGE	V	-1.337	1.161
3	KP	TRANSCONDUCTANCE PARAMETER	A/V2	2.310-5	4.610-5
4	GAMMA	BULK THRESHOLD PARAMETER		0.501	0.354
5	PHI	SURFACE POTENTIAL	V	0.695	0.660
6	RD	DRAIN OHMIC RESISTANCE	OHM	333	85
7	RS	SOURCE OHMIC RESISTANCE	OHM	333	85
8	CBD	ZERO-BIAS B-D JUNCTION CAPACITANCE	F	1.9810-14	6.910-15
9	CBS	ZERO-BIAS B-S JUNCTION CAPACITANCE	F	1.9810-14	6.910-15
10	IS	BULK JUNCTION SATURATION CURRENT	A	3.4710-15	9.2210-15
11	PB	BULK JUNCTION POTENTIAL	V	0.8	0.8
12	CGSO	GATE-SOURCE OVERLAP CAPACI- TANCE PER METER CHANNEL WIDTH	F/M	6.7010-10	3.3010-10
13	CGDO	GATE-DRAIN OVERLAP CAPACI- TANCE PER METER CHANNEL WIDTH	F/M	6.7010-10	3.3010-10
14	CGBO	GATE-BULK OVERLAP CAPACITANCE PER METER CHANNEL LENGTH	F/M	1.9010-9	2.6010-9
15	RSH	DRAIN AND SOURCE DIFFUSION SHEET RESISTANCE	OHM/SQ	55	30
16	CJ	ZERO-BIAS BULK JUNCTION BOTTOM CAPACITANCE PER SQ METER OF JUNCTION AREA	F/M2	3.5310-4	1.2410-4
17	MJ	BULK JUNCTION BOTTOM GRADING COEFFICIENT	-	0.5	0.5
18	CJSW	ZERO-BIAS BULK JUNCTION SIDE- WALL CAPACITANCE PER METER OF JUNCTION PERIMETER	F/M	1.7110-10	3.2010-11

Страницы: 1, 2

Приглашения

09.12.2013 - 16.12.2013

Международный конкурс хореографического искусства в рамках Международного фестиваля искусств «РОЖДЕСТВЕНСКАЯ АНДОРРА»

09.12.2013 - 16.12.2013

МЕНЮ

НАУЧНЫЕ РАБОТЫ

Реферат: Physical Methods of Speed-Independent Module Design

Приглашения

Международный конкурс хореографического искусства в рамках Международного фестиваля искусств «РОЖДЕСТВЕНСКАЯ АНДОРРА»

Международный конкурс хорового искусства в АНДОРРЕ «РОЖДЕСТВЕНСКАЯ АНДОРРА»