Thursday, April 30, 2009

Optical Interconnect

I. INTRODUCTION

Future high speed computing is marching toward Tera-Flops scale. To maintain a byte bandwidth per flops, Tera-b/s I/O bandwidth is required to support the future high speed computing demand. By 2017 high speed CPU need to deliver 256 core which enable 10Tera-flops and require 20 TFlops bandwidth to support flat programming model. To support such huge amount of bandwidth, more than 16,000 electrical couple wires are required for core-to-core communication. This make the routing become complicated either on-chip or off-chip. Though some interconnect innovation such as having more metal routing layer or 3D stack can improve the metal congestion issue, however this increase the routing length and eventually hit the electrical channel aspect ratio. Moore’s law suggests as process technology continue to scale, overall interconnect RC time constant remain the constant. When more bits are sent through the electrical channel in a given period, time constant is limiting the overall performance. Electrical channel loss in high frequency (GHz range) due to plane skin effect is critical in any high speed I/O design. Signal dispersion that cause Inter-symbol-Interference (ISI) is another high speed I/O design issue that needs to be handled carefully. When more wires are packed, cross-talk becomes unavoidable and further degrade signal quality. Signal termination mismatch is another common issue in any electrical channel signal integrity engineering. Termination mismatch can cause signal reflection from receiver end to driver end and keep bouncing back and forth until the energy is dissipated in the channel. All the channel limitation discussed above require additional circuitry such as equalizer, termination compensation, pre-emphasis etc which increase die size and dissipate more power. To resolve the electrical channel limitation, a new interconnect material need to be considered to replace convention electrical interconnect.

Optical interconnect is considered a solution for channel bandwidth limiting. This is due to optical interconnect has different electrical characteristic compare to electrical interconnect. Optical interconnect has negligible signal latency for short distance communication, very high channel bandwidth with light as channel carrier and absence of electromagnetic wave phenomena (impedance matching, crosstalk, and inductance effect). With the above described characteristic optical interconnect can be utilized to eliminate the short coming of electrical interconnect. The optical transceiver is expected to be much smaller, simpler compare to electrical link transceiver as some complicated circuit can be removed. With Wavelength Division Multiplexing (WDM), multiple wavelengths can be transmitted by using one single optical waveguide hence improve bandwidth to latency efficiency. Therefore optical interconnect can be used as core to core connection or global clock distribution network.

II. POTENTIAL BENEFIT OF OPTICAL

A) Scaling of Interconnects

Scaling of interconnect in three dimensional is required to accommodate bandwidth expansion when IC world is moving toward system-on-chip (SOC) design. RCtotal of an electrical interconnect is given by RlCll2 where Rl is resistance per unit length and Cl is capacitance per unit length. When the interconnect is scaled at its surface area, RCtotal = (Rl/s2)Cl (sl)2= RlCll2 which remain the same as before scaled.

The above shows an ideal case of an electrical interconnect latency. However the actual interconnect resistance is increased with scaling factor. This is due to barrier layer has high resistance compared to electrical interconnect. Scaling the dimensions of interconnects while the barrier thickness remain constant would result a total increase of effective resistivity that would degrade the delay of the interconnects. In addition to electrical interconnect scaled, surface-scattering effect is obvious and increases the interconnect resistivity. When transistor scaled, transistor switches faster while RC delay is remain the same. The result will be a decrease in overall system performance.

Therefore to keep the overall system performance, high speed signal is routed at global interconnect which is wider. As a result, more metal layers need to be developed to accommodate interconnect bandwidth demand while maintaining the system performance. To analyze the advantage of optical interconnect over Cu interconnect, interconnect functions are categorized into signal and clock distribution purposes.

B) Signaling

Microprocessor die size remains constant (1cm2) regardless of the process node. It is true that optical interconnect has negligible signal latency against Cu interconnect. However there is latency inherited from the optical receiver (TIA) switch and photo-detector responsivity factor. The TIA speed can be scaled faster according to process node. But photo-detector by nature is independent of transistor scale factor. Therefore it is valid to assume photo-detector responsivity factor remains constant across process node. Due to the latency induced from the transceiver and photo-detector, the optical interconnect only show some advantage compare to its competitors in certain scenario. Figure 1(a) shows optical has great advantage over scaled Cu interconnect beyond 200µm of interconnect critical length. However optical interconnect 1(b) shows no advantage over non-scaled Cu interconnect below 45 nm process node since the length of a microprocessor die size is within 1 cm.

Figure 2 shows the bandwidth to latency ratio of various interconnect types. Due to the complexity of Nitrate waveguide, optical interconnect shows the worst bandwidth to latency efficiency compare to scaled Cu and Non-scaled Cu interconnect. However by introducing wave-length division multiplexing scheme (WDM) which inject multi wavelength mode to similar optical waveguide, this shows the best bandwidth to latency efficiency over the rest.












Figure 1. Critical length for optical interconnects : (a) optical interconnects compared against scaled Cu interconnects, (b) optical interconnects compared against non-scales Cu interconnects [2]











Figure 2. Bandwidth/latency ratio as a function of technology node for optical, optical with WDM, scaled and non-scaled Cu interconnects [2]

C) Clock Distribution

For clock distribution analysis, clock skew and jitter are the main issues that need to be considered. Optical interconnect offer good signal integrity immune over inductive and capacitive crosstalk which Cu interconnect can not offer. From figure 3 a good improvement of optical interconnects over scaled Cu in reducing clock skew and jitter has achieved. However it shows not much improvement on optical skew and jitter performance over non-scaled Cu interconnect. There is no great motivation to push optical interconnect in clock distribution as it may induced risk since there is already a solution to handle signal integrity issue in non-scaled Cu interconnect.











Figure 3. Comparison of the global skew and jitter as a function of technology node for clock distribution using optical, scaled, and non-scaled Cu interconnects [2]

D) Design Simplication

Optical interconnect is independent of electromagnetic wave phenomena. Such characteristic has eliminated common signal integrity issues such as impedance matching, cross-talk and inductance difficulties that seen in electrical interconnect. Impedance matching between transceiver and channel characteristic impedance is a difficult engineering problem as on-die impedance is process-voltage-temperature (PVT) dependent. Without a good termination, signal bounce in between driver and receiver end and caused signal ringing effect. To compensate the termination PVT variation, additional circuitry is needed to trade-off die size area and power dissipation. Optical interconnect able to simplify the high speed I/O design since it is immune to impedance mismatch issue. For core-to-core interconnect, high speed long route interconnect is unavoidable. Major inductive effect and skin effect is impacting the signal performance. The usual solution is to implement repeater along the signal path. With optical interconnect, repeater can be removed. Die size and power dissipation can be improved further. Other circuit to counter bandwidth issue of electrical channel such as ISI effect that reduce the eye-diagram margin can be removed in optical interconnect. To improve the eye-diagram margin, a de-emphasis driver and delay string circuit are needed which added complexity in current high speed I/O design.

III. OPTICAL INTECONNECT SYSTEM

Figure 4 shows the general on-chip optical system for signaling purpose. A complete on-chip optical system requires laser device, optical modulator, transmitter circuit, optical waveguide, photo-detector and TIA. INTEL has developed an optical transceiver that run at maximum 18Gb/s data rate. The prototype is a hybrid-implementation of optical interconnect. The waveguide, GaAs VCSELs and detectors are implemented off-chip. VCSEL and detector are flip-chip bonded to the package substrate and waveguides are embedded in package substrate.











Figure 4. on-chip optical system for signaling [2]

Figure 5 shows the cross-section and layout view of MSM Ge detector. Ge material is growth on SiO2 interlayer dielectric that compatible with CMOS process. The device design has length of 3 um and 1 um width. Responsibility measured is about 0.9 A/W at 1V bias and bandwidth of 35 GHz.

Figure 6 shows the ring-resonator cross-section which run at 10GHz modulation at 2.7 Vp-p drive. The ring-resonator can be inserted into backend of the CMOS process. Nitrate is growth on silicon as optical waveguide. A fully monolithic optical system can be realized in CMOS process which is high bandwidth, high bandwidth density and energy efficiency for an optical I/O link.











Figure 5. CMOS logic compatible waveguide coupled MSM Ge photodetector [3]











Figure 6. Ring-resonator electro-optic polymer modulator device structure and performance [3]

IV. OPTICAL TRANSCEIVER

Figure 7 shows the optical transceiver topology published by INTEL in 2009 ISSC. The transmitter topology is similar to convention high speed I/O design which contains PLL to generate frequency up to 9GHz. Pre-emphasis driver and equalizer is still incorporated in the topology as the hybrid implementation require microstrip trace from output PAD to VCSEL.

For the receiver, input capacitance introduced from photo-detector is critical to the overall system bandwidth. Current GaAs photo-detector has input capacitance of 250fF which has dominant pole created at receiver input. Therefore TIA design is critical to reduce the input resistance seen from photo-detector. The prototype shows here use a cross coupled differential pair as input stage in order to improve transmission loop gain and hence reduce the input resistance. However TIA require another voltage gain amplifier that require more die space. In [3], a double-oversampling topology is demonstrated in implementing input stage of receiver that can eliminate the voltage gain amplifier.











Figure 7. Optical transceiver cell [3]

V. CONCLUSION

A detail physical characteristic of optical channel is explained which shows overwhelm advantages over electrical interconnect in terms of good signal integrity. Then comparison of optical interconnect with Cu interconnect is shown across different process node. Optical interconnect show great improvement in bandwidth/latency efficiency for global signal route than local route. However in term of clock distribution, not much advantage can be offered from optical interconnect unless photo-dectector responsivity can be further improved. By eliminating electromagnetic wave phenomena, optical interconnect has tremendous potential in simplify I/O transceiver design. This is critical in order to implement thousand of optical I/O array on die if transceiver is small enough. In recent 2009 ISSC, INTEL has published a complete optical transceiver solution which can be further leverage to close the technology requirement on I/O bandwidth in order to support multi-core demand. However there are still more potential can be done on improving the photo-detector input capacitance as this is critical to improve the bandwidth limiting induced from receiver itself.
References

[1] David A. B. Miller, “Rationale and Challenges for Optical Interconnects to Electronic Chips” Proc. Of the IEEE, vol.88, no.6, June 2000
[2] Mauro J. Kobrinsky, Bruce A. Block, Jun-Fei Zheng, Brandon C. Barnet, Edris Mohammed, Miriam Reshotko, Frank Robertson, Scott List, Ian Young, Kenneth Cadien, “On-chip Optical Interconnects” Intel Technology Journal, Vol. 8, issue. 2, May 10,, 2004
[3] Ian Young, Edris Mohammed, Jason Liao, Alexandra Kern, Samuel Palermo, Bruce Block, Miriam Reshotko, Peter Chang, “Optical I/O Technology for Tera-Scale Computing” IEEE International Solid-State Circuits Conference, 2009
[4] Alexandra Kern, Anantha Chandrakasan, Ian Young, “18Gb/s Optical IO: VCSEL Driver and TIA in 90nm CMOS”, Symposium on VLSI Circuits Digest of Technical Papers, 2007
[5] David A. B. Miller, “Optical Interconnects to Silicon”, IEEE Journal on Selected Topics in Quantum Electronics, vol.6, No.6, Nov 2000

Saturday, April 25, 2009

CMOS Technology

According to Moore’s law, CMOS technology channel length is shrinking to improve speed and accommodate more features in a single die. Transistor, fT is inversely proportional to channel length and has push device to run on GHz range. While continue pushing transistor speed, it has hit the power well of the device in 2003. It ultimately ended INTEL processor speed race with maximum speed of 4GHz processor in the market. INTEL is then making a 360˚ change to multi-core processor architecture and has pushed more features into a processor architecture design for power saving purposes. This includes North Bridge (memory control hub). Power can be reduced to half with two or more core-processors. Many-many core architecture will be the future trend for the processor which again emphasize on the importance of smaller channel length to reduce the transistor dimension to enabling the technology requirement.

However the scaling of the channel length is followed by gate oxide thickness and gate voltage scaling to maintain allowable electric field. Furthermore, power rail of the transistor is gradually decreased in order to save power. As Ids proportional to Cox, Gate oxide thickness scaling is critical to increase the drive strength of the device from one generation to another generation. To increase Cox, gate oxide thickness, Tox need to be increased too. With Tox of about 2 nm (equivalent to 20 atom size thickness) in 65 nm process technology, this has induced large amount of gate leakage current which caused by gate tunneling effect. It is estimated to have Ioff of ~200 nA/μm with 15 nm of minimum channel length, Lmin. This contributes to huge static power in billion transistors System-on-Chip (SOC) product.

To solve this problem, high-K material is introduced to increase the gate oxide dielectric field constant. Cox = ErEoA/Tox, the equation shows how the material dielectric field constant, Er can increase Cox while giving more space for gate oxide thickness scaling. Currently, INTEL is using hafnium oxide to replace convention silicon oxide material which is able to increase Er from 3.9 (silicon oxide) to 25 (hafnium oxide).

The disadvantages of using high K material with convention poly-si gate material are higher voltage threshold and poorer mobility. Higher voltage threshold gives design problem on lack of voltage headroom for circuitry design and slower switching response. This is a no no for high speed logic and analog design. The higher voltage threshold is due to Fermi level pinning at high K and poly-si interface. The poorer mobility of high k material with poly-si is due to the high k material phonon dipole in-resonance with plasma oscillation which can couple strongly into silicon channel and degrades the electron mobility. Therefore a metal, TiN act as metal work function with hafnium oxide are able to improve the channel electron mobility. Metal gate has opposite plasma oscillation and able to cancel or weakening the electric field in silicon channel and recover the mobility.

High K + metal gate device has achieved 50% Isat/Ioff ratio improvement on PMOS and 12% improvement on NMOS device compare to 65 nm process. While for gate leakage improvement, PMOS is 1000X better than 65 nm process PMOS. For threshold voltage rolloff, 45 nm technology High K + metal gate device scatter around 0.15-0.35V while 65 nm SiO2 + poly-si scatter around 0.35-0.55V. With lower threshold voltage, the device switches faster and stronger drive strength. It is general believe that high-K metal gate CMOS is the trend up to 22 nm process technology before tri-gate device with large parasitic issue can be resolved.

Tuesday, April 14, 2009

diode band diagram

Saturday, April 11, 2009

american woman know how to use mouth... you don't know how to use mouth!!