Saturday, May 2, 2009

过零丁洋 - 文天祥

辛苦遭逢起一经,干戈寥落四周星。
山河破碎风飘絮,身世浮沉雨打萍。
惶恐滩头说惶恐,零丁洋里叹零丁。
人生自古谁无死?留取丹心照汗青。

七百年前的一段往事....

公元1279年二月,南宋残军与元军在新会崖门海域(今属广东省江门市)展开了一场历时20多天的大海战,双方投入兵力数十万,动用战船2千余艘,最终宋军全军覆没,丞相陆秀夫背着年仅9岁的皇帝赵昺蹈海殉国,赵宋王朝灭亡。         

1276年初,蒙古铁骑一路南下,临安沦陷,南宋朝廷土崩瓦解。年幼的益王赵昰和广王赵昺,在母亲杨太后的 带领下,逃出都城,到达温州。大臣陆秀夫派人招来了躲藏于此的陈宜中,大将张世杰也率兵从定海前来会合。五月一日,赵昰在福州即位,是为端宗,改元景炎。 陈宜中被任命为左丞相兼枢密使,张世杰为枢密副使,陆秀夫为签书枢密院事。        

 南宋虽然投降元朝,但福建、两广大片地区仍处在流亡小朝廷的控制之下,李庭芝也在淮东、淮西地区进行着顽强抵抗。但淮东、淮西等地相继失陷,李庭芝战 死。景炎元年(1276)十一月,元军逼近福州,此时福州有正规军17万,民兵30万,淮兵万人,足可一战,但由于主持朝政的陈宜中胆小怕事,因此小朝廷 立足未稳,就又开始了逃亡。离开福州之后,小朝廷只能四处流亡,辗转泉州、潮州、惠州等地。景炎三年(1278)春,来到雷州附近。逃亡途中,宰相陈宜中 借口联络占城,一去不返。端宗在逃亡途中患病,四月十五日病死,年仅11岁。端宗死后,群龙无首,眼看小朝廷就要分崩离析,陆秀夫慷慨激昂,振作士气: “诸君为何散去?度宗一子还在,他怎么办呢?古人有靠一城一旅复兴的,何况如今还有上万将士,只要老天不绝赵氏,难道不能靠此再造一个国家么?”众臣便又 拥立年方7岁的赵昺为帝,改元祥兴。         元军步步为营,不久雷州失守,形势危急。张世杰数次率军反攻雷州,但都没有成功,于是将流亡政权迁至崖山。历史似乎注定了要选在这里翻开空前悲壮的一页。         

宋军到达崖山时,尚有正规军和民兵20万人,而进攻的元军只有数万,仅就兵力而言,双方相差悬殊,且元军不 善水战,宋军无疑占有优势。但张世杰此时指挥出现了严重失误,他判断蒙古人的优势是骑兵,不擅水战,必须依靠水军与之作战,因此放弃了对崖门入海口的控 制,把千余艘战船背山面海,用大索连接,四面围起楼栅,结成水寨方阵,把木制战船两侧用衬垫覆盖,以防御元军的火箭和炮弩,赵昺的御船居于方阵之中,打算 在此死守。张世杰此举失误在于,一是放弃了对入海口的控制权,等于把战争的主动权拱手交给了对方;二是把千余战船贯以大索,结成水寨,虽然集中了力量,却 丧失了机动性。此后张弘范率大批元军赶到,控制了崖山之南的入海口,又从北面和南面两个侧翼切断了宋军的所有退路。宋军陷入孤立无援的境地,此后10多天 的中,宋军只能以干粮充饥,饮海水解渴,饮过海水的士兵呕吐不止,战斗力严重削弱。         

当时,张世杰有个外甥在元军中,张弘范一连三次派其至宋营对张世杰劝降,张世杰说:“吾知降生且富贵,但 义不可移耳!”。张弘范又叫囚禁中的文天祥写信招降张世杰,文天祥说:“吾不能捍父母,乃教人叛父母,可呼?”于是他写出了那首千古传诵的《过零丁洋》。 张弘范看了一笑置之。张弘范没法,再派人向崖山的士民说:“你陈(宜中)丞相已去(占城),文(天祥)丞相已执,你们还想怎样呢?”士民亦无叛者。         

二月初六早晨,元军发起总攻。元将李恒指挥水军利用早晨退潮、海水南流的时机,渡过平时战舰难以渡过的浅 水,从北面对宋军发动了一场突袭,到中午,北面的宋军已被元军击溃。南面的元军又在张弘范的指挥下,利用中午涨潮、海水北流的时机,向宋军发动了另一次进 攻。宋军南北受敌,士兵又身心疲惫,无力战斗,全线溃败。战斗从黎明进行到黄昏,宋军阵脚大乱,张世杰下令砍断绳索,率10余战舰护卫杨太后突围。张世杰 率帅船杀到外围,见赵昺的御船过于庞大,无法突围,便派小舟前去接应。当时天色已晚,海面上风雨大作,对面不辨人影,陆秀夫惟恐小船为元军假冒,断然拒绝 来人将赵昺接走。张世杰无奈,只得率战舰护卫着杨太后杀出崖门。         

宋军败局已定,陆秀夫知道已没有逃脱的可能,便把自己的妻子儿子赶下大海,然后对赵昺说:“事已至此,陛 下当为国捐躯。德佑皇帝受辱已甚,陛下不可再辱!”赵昺身穿龙袍,胸挂玉玺,随陆秀夫跳海自尽。数天之后,陆秀夫尸体浮出海面,被当地乡人冒死收葬。元军 在清理战场的时候,发现一具身穿黄衣的幼童尸体,身上带有金玺,上书“诏书之宝”四字,送交张弘范,经确认是赵昺所带玉玺。张弘范再派人寻找赵昺尸体时, 已下落不明。         张世杰保护杨太后冲出重围,听到帝昺的死讯后,杨太后手掩胸口大哭:“我不顾生死,万里跋涉来到这里,为的是存赵氏血脉,现在已经无望了!”于是跳海身 亡。元军继续派重兵追击张世杰,处于弱势的宋军且战且走。张世杰计划到占城后整顿军马,再图恢复,五月初四,船到南恩州平章港(今阳江海陵岛)遭遇台风。 部下劝他登岸,他说不用了,焚香仰天拜道:“我为赵氏亦已尽心尽力,一君亡,又立一君,现又已亡,我不死,是想为赵氏存宗祀。天若不让我光复赵氏,大风吹 翻吾船!”此时,风浪更大,舟覆人亡,幸存士卒为张世杰焚尸殓葬,墓今仍在海陵岛上。南宋这支残存的抵抗力量至此完全覆没。         

崖山之战是灭亡南宋的最后一战,从战术层面看来,张世杰、陆秀夫等人的部署失当,对战役失败负有不可推卸 的责任,但他们在绝境中所表现出来的民族气节和那种知其不可为而为之的勇气,不能不让人叹服。今日位于崖山南面的崖山祠,是一座古老的建筑。在这里供奉有 陆秀夫和张世杰的塑像,以纪念他们的忠烈。这两个人,一文一武,正是在流浪小朝廷中起了关键作用的中流砥柱。 文天祥、陆秀夫、张世杰也被后世并称为“宋末三杰”。         

崖山海战,是空前惨烈的一场战役,说其惨烈,更多的是体现在战役胜负已定之后。在大势已去的情况下,为了不 使战舰落入敌手,宋军将数百艘战舰自行凿沉,然后,超过十万众的南宋军民,包括官员、士兵、妇女、百姓,不愿被残暴的蒙古政权所奴役,纷纷韬海自尽……元 朝所编的宋史客观的记载了这段史实:七日之后,海上浮尸以十万计……10万人自杀,在浩如烟海的中国历史长河中,只留下了一行字,但它背后的震撼意义,令 后人叹息不已。在国家命运的转折关头,从皇帝到大臣、士兵甚至普通百姓,每个人都在用自己的行为表态。在崖山附近一个叫延安村的小村庄旁边,有一座没有碑 文的坟墓,与众不同的是,墓的四周,全部是用蚝壳围砌起来的。传说这就是当年的杨太后之墓。杨太后殉国后,匆忙之间,张世杰将她葬在了这里。迫于元朝的压 迫,百姓不敢为她树碑立传,只得用蚝壳为杨太后修建了一座特殊的坟墓。为了怀念这位坚贞不屈的太后,每逢四月初二杨太后诞辰日,四乡八里的百姓都会来这里 祭拜,久而久之,已经成为了一种风俗。 此外,还有一些忠义之士,冒着天大的危险,将侥幸逃脱的赵氏皇族后代严密保护起来,隐姓埋名,直到元朝灭亡,他们才恢复本来姓氏,在崖门附近建立了赵氏的 皇族村,并将宋代十八位皇帝的灵位永久供奉在村里的祠堂里。         

南宋虽然覆没,但输得是这样的悲壮,这样有节烈之气,勇士们面对外族入侵和压迫,拼死抵抗,为争取民族生存、自尊、自卫而英勇献身,义无反顾,闪耀着爱国主义的“崖山精神”,即中华民族精神。崖山精神,春秋大义,鼓舞后人。“崖山多忠魂,后先照千古。”以文天祥、陆秀夫、张世杰“三忠”为代表的忠臣义士受到历代肯定,任凭沧海桑 田,时代更叠,他们永被历史和人民铭记,千古流芳。正如蔡东藩《宋史通俗演义》最后结句诗:“一代沧桑洗不尽,幸存三烈尚流芳。”正因为有了他们的精神存 在,崖山不仅仅是南宋王朝最后灭亡的遗恨之地,也从来就是人们抒发爱国情怀之地,历代政要、名人墨客、平民百姓临崖凭吊、叹息、兴感、追怀.

古中华遗风,尽是如此悲壮,连相对柔弱的南宋,都有十万军民自发跳海殉国,这样的气节,何时能再次拥有?中华文明的复兴,需要今人的努力。崖山之役,应写入教科书,告知后人,华夏当有那样的气节。

Thursday, April 30, 2009

Optical Interconnect

I. INTRODUCTION

Future high speed computing is marching toward Tera-Flops scale. To maintain a byte bandwidth per flops, Tera-b/s I/O bandwidth is required to support the future high speed computing demand. By 2017 high speed CPU need to deliver 256 core which enable 10Tera-flops and require 20 TFlops bandwidth to support flat programming model. To support such huge amount of bandwidth, more than 16,000 electrical couple wires are required for core-to-core communication. This make the routing become complicated either on-chip or off-chip. Though some interconnect innovation such as having more metal routing layer or 3D stack can improve the metal congestion issue, however this increase the routing length and eventually hit the electrical channel aspect ratio. Moore’s law suggests as process technology continue to scale, overall interconnect RC time constant remain the constant. When more bits are sent through the electrical channel in a given period, time constant is limiting the overall performance. Electrical channel loss in high frequency (GHz range) due to plane skin effect is critical in any high speed I/O design. Signal dispersion that cause Inter-symbol-Interference (ISI) is another high speed I/O design issue that needs to be handled carefully. When more wires are packed, cross-talk becomes unavoidable and further degrade signal quality. Signal termination mismatch is another common issue in any electrical channel signal integrity engineering. Termination mismatch can cause signal reflection from receiver end to driver end and keep bouncing back and forth until the energy is dissipated in the channel. All the channel limitation discussed above require additional circuitry such as equalizer, termination compensation, pre-emphasis etc which increase die size and dissipate more power. To resolve the electrical channel limitation, a new interconnect material need to be considered to replace convention electrical interconnect.

Optical interconnect is considered a solution for channel bandwidth limiting. This is due to optical interconnect has different electrical characteristic compare to electrical interconnect. Optical interconnect has negligible signal latency for short distance communication, very high channel bandwidth with light as channel carrier and absence of electromagnetic wave phenomena (impedance matching, crosstalk, and inductance effect). With the above described characteristic optical interconnect can be utilized to eliminate the short coming of electrical interconnect. The optical transceiver is expected to be much smaller, simpler compare to electrical link transceiver as some complicated circuit can be removed. With Wavelength Division Multiplexing (WDM), multiple wavelengths can be transmitted by using one single optical waveguide hence improve bandwidth to latency efficiency. Therefore optical interconnect can be used as core to core connection or global clock distribution network.

II. POTENTIAL BENEFIT OF OPTICAL

A) Scaling of Interconnects

Scaling of interconnect in three dimensional is required to accommodate bandwidth expansion when IC world is moving toward system-on-chip (SOC) design. RCtotal of an electrical interconnect is given by RlCll2 where Rl is resistance per unit length and Cl is capacitance per unit length. When the interconnect is scaled at its surface area, RCtotal = (Rl/s2)Cl (sl)2= RlCll2 which remain the same as before scaled.

The above shows an ideal case of an electrical interconnect latency. However the actual interconnect resistance is increased with scaling factor. This is due to barrier layer has high resistance compared to electrical interconnect. Scaling the dimensions of interconnects while the barrier thickness remain constant would result a total increase of effective resistivity that would degrade the delay of the interconnects. In addition to electrical interconnect scaled, surface-scattering effect is obvious and increases the interconnect resistivity. When transistor scaled, transistor switches faster while RC delay is remain the same. The result will be a decrease in overall system performance.

Therefore to keep the overall system performance, high speed signal is routed at global interconnect which is wider. As a result, more metal layers need to be developed to accommodate interconnect bandwidth demand while maintaining the system performance. To analyze the advantage of optical interconnect over Cu interconnect, interconnect functions are categorized into signal and clock distribution purposes.

B) Signaling

Microprocessor die size remains constant (1cm2) regardless of the process node. It is true that optical interconnect has negligible signal latency against Cu interconnect. However there is latency inherited from the optical receiver (TIA) switch and photo-detector responsivity factor. The TIA speed can be scaled faster according to process node. But photo-detector by nature is independent of transistor scale factor. Therefore it is valid to assume photo-detector responsivity factor remains constant across process node. Due to the latency induced from the transceiver and photo-detector, the optical interconnect only show some advantage compare to its competitors in certain scenario. Figure 1(a) shows optical has great advantage over scaled Cu interconnect beyond 200µm of interconnect critical length. However optical interconnect 1(b) shows no advantage over non-scaled Cu interconnect below 45 nm process node since the length of a microprocessor die size is within 1 cm.

Figure 2 shows the bandwidth to latency ratio of various interconnect types. Due to the complexity of Nitrate waveguide, optical interconnect shows the worst bandwidth to latency efficiency compare to scaled Cu and Non-scaled Cu interconnect. However by introducing wave-length division multiplexing scheme (WDM) which inject multi wavelength mode to similar optical waveguide, this shows the best bandwidth to latency efficiency over the rest.












Figure 1. Critical length for optical interconnects : (a) optical interconnects compared against scaled Cu interconnects, (b) optical interconnects compared against non-scales Cu interconnects [2]











Figure 2. Bandwidth/latency ratio as a function of technology node for optical, optical with WDM, scaled and non-scaled Cu interconnects [2]

C) Clock Distribution

For clock distribution analysis, clock skew and jitter are the main issues that need to be considered. Optical interconnect offer good signal integrity immune over inductive and capacitive crosstalk which Cu interconnect can not offer. From figure 3 a good improvement of optical interconnects over scaled Cu in reducing clock skew and jitter has achieved. However it shows not much improvement on optical skew and jitter performance over non-scaled Cu interconnect. There is no great motivation to push optical interconnect in clock distribution as it may induced risk since there is already a solution to handle signal integrity issue in non-scaled Cu interconnect.











Figure 3. Comparison of the global skew and jitter as a function of technology node for clock distribution using optical, scaled, and non-scaled Cu interconnects [2]

D) Design Simplication

Optical interconnect is independent of electromagnetic wave phenomena. Such characteristic has eliminated common signal integrity issues such as impedance matching, cross-talk and inductance difficulties that seen in electrical interconnect. Impedance matching between transceiver and channel characteristic impedance is a difficult engineering problem as on-die impedance is process-voltage-temperature (PVT) dependent. Without a good termination, signal bounce in between driver and receiver end and caused signal ringing effect. To compensate the termination PVT variation, additional circuitry is needed to trade-off die size area and power dissipation. Optical interconnect able to simplify the high speed I/O design since it is immune to impedance mismatch issue. For core-to-core interconnect, high speed long route interconnect is unavoidable. Major inductive effect and skin effect is impacting the signal performance. The usual solution is to implement repeater along the signal path. With optical interconnect, repeater can be removed. Die size and power dissipation can be improved further. Other circuit to counter bandwidth issue of electrical channel such as ISI effect that reduce the eye-diagram margin can be removed in optical interconnect. To improve the eye-diagram margin, a de-emphasis driver and delay string circuit are needed which added complexity in current high speed I/O design.

III. OPTICAL INTECONNECT SYSTEM

Figure 4 shows the general on-chip optical system for signaling purpose. A complete on-chip optical system requires laser device, optical modulator, transmitter circuit, optical waveguide, photo-detector and TIA. INTEL has developed an optical transceiver that run at maximum 18Gb/s data rate. The prototype is a hybrid-implementation of optical interconnect. The waveguide, GaAs VCSELs and detectors are implemented off-chip. VCSEL and detector are flip-chip bonded to the package substrate and waveguides are embedded in package substrate.











Figure 4. on-chip optical system for signaling [2]

Figure 5 shows the cross-section and layout view of MSM Ge detector. Ge material is growth on SiO2 interlayer dielectric that compatible with CMOS process. The device design has length of 3 um and 1 um width. Responsibility measured is about 0.9 A/W at 1V bias and bandwidth of 35 GHz.

Figure 6 shows the ring-resonator cross-section which run at 10GHz modulation at 2.7 Vp-p drive. The ring-resonator can be inserted into backend of the CMOS process. Nitrate is growth on silicon as optical waveguide. A fully monolithic optical system can be realized in CMOS process which is high bandwidth, high bandwidth density and energy efficiency for an optical I/O link.











Figure 5. CMOS logic compatible waveguide coupled MSM Ge photodetector [3]











Figure 6. Ring-resonator electro-optic polymer modulator device structure and performance [3]

IV. OPTICAL TRANSCEIVER

Figure 7 shows the optical transceiver topology published by INTEL in 2009 ISSC. The transmitter topology is similar to convention high speed I/O design which contains PLL to generate frequency up to 9GHz. Pre-emphasis driver and equalizer is still incorporated in the topology as the hybrid implementation require microstrip trace from output PAD to VCSEL.

For the receiver, input capacitance introduced from photo-detector is critical to the overall system bandwidth. Current GaAs photo-detector has input capacitance of 250fF which has dominant pole created at receiver input. Therefore TIA design is critical to reduce the input resistance seen from photo-detector. The prototype shows here use a cross coupled differential pair as input stage in order to improve transmission loop gain and hence reduce the input resistance. However TIA require another voltage gain amplifier that require more die space. In [3], a double-oversampling topology is demonstrated in implementing input stage of receiver that can eliminate the voltage gain amplifier.











Figure 7. Optical transceiver cell [3]

V. CONCLUSION

A detail physical characteristic of optical channel is explained which shows overwhelm advantages over electrical interconnect in terms of good signal integrity. Then comparison of optical interconnect with Cu interconnect is shown across different process node. Optical interconnect show great improvement in bandwidth/latency efficiency for global signal route than local route. However in term of clock distribution, not much advantage can be offered from optical interconnect unless photo-dectector responsivity can be further improved. By eliminating electromagnetic wave phenomena, optical interconnect has tremendous potential in simplify I/O transceiver design. This is critical in order to implement thousand of optical I/O array on die if transceiver is small enough. In recent 2009 ISSC, INTEL has published a complete optical transceiver solution which can be further leverage to close the technology requirement on I/O bandwidth in order to support multi-core demand. However there are still more potential can be done on improving the photo-detector input capacitance as this is critical to improve the bandwidth limiting induced from receiver itself.
References

[1] David A. B. Miller, “Rationale and Challenges for Optical Interconnects to Electronic Chips” Proc. Of the IEEE, vol.88, no.6, June 2000
[2] Mauro J. Kobrinsky, Bruce A. Block, Jun-Fei Zheng, Brandon C. Barnet, Edris Mohammed, Miriam Reshotko, Frank Robertson, Scott List, Ian Young, Kenneth Cadien, “On-chip Optical Interconnects” Intel Technology Journal, Vol. 8, issue. 2, May 10,, 2004
[3] Ian Young, Edris Mohammed, Jason Liao, Alexandra Kern, Samuel Palermo, Bruce Block, Miriam Reshotko, Peter Chang, “Optical I/O Technology for Tera-Scale Computing” IEEE International Solid-State Circuits Conference, 2009
[4] Alexandra Kern, Anantha Chandrakasan, Ian Young, “18Gb/s Optical IO: VCSEL Driver and TIA in 90nm CMOS”, Symposium on VLSI Circuits Digest of Technical Papers, 2007
[5] David A. B. Miller, “Optical Interconnects to Silicon”, IEEE Journal on Selected Topics in Quantum Electronics, vol.6, No.6, Nov 2000

Saturday, April 25, 2009

CMOS Technology

According to Moore’s law, CMOS technology channel length is shrinking to improve speed and accommodate more features in a single die. Transistor, fT is inversely proportional to channel length and has push device to run on GHz range. While continue pushing transistor speed, it has hit the power well of the device in 2003. It ultimately ended INTEL processor speed race with maximum speed of 4GHz processor in the market. INTEL is then making a 360˚ change to multi-core processor architecture and has pushed more features into a processor architecture design for power saving purposes. This includes North Bridge (memory control hub). Power can be reduced to half with two or more core-processors. Many-many core architecture will be the future trend for the processor which again emphasize on the importance of smaller channel length to reduce the transistor dimension to enabling the technology requirement.

However the scaling of the channel length is followed by gate oxide thickness and gate voltage scaling to maintain allowable electric field. Furthermore, power rail of the transistor is gradually decreased in order to save power. As Ids proportional to Cox, Gate oxide thickness scaling is critical to increase the drive strength of the device from one generation to another generation. To increase Cox, gate oxide thickness, Tox need to be increased too. With Tox of about 2 nm (equivalent to 20 atom size thickness) in 65 nm process technology, this has induced large amount of gate leakage current which caused by gate tunneling effect. It is estimated to have Ioff of ~200 nA/μm with 15 nm of minimum channel length, Lmin. This contributes to huge static power in billion transistors System-on-Chip (SOC) product.

To solve this problem, high-K material is introduced to increase the gate oxide dielectric field constant. Cox = ErEoA/Tox, the equation shows how the material dielectric field constant, Er can increase Cox while giving more space for gate oxide thickness scaling. Currently, INTEL is using hafnium oxide to replace convention silicon oxide material which is able to increase Er from 3.9 (silicon oxide) to 25 (hafnium oxide).

The disadvantages of using high K material with convention poly-si gate material are higher voltage threshold and poorer mobility. Higher voltage threshold gives design problem on lack of voltage headroom for circuitry design and slower switching response. This is a no no for high speed logic and analog design. The higher voltage threshold is due to Fermi level pinning at high K and poly-si interface. The poorer mobility of high k material with poly-si is due to the high k material phonon dipole in-resonance with plasma oscillation which can couple strongly into silicon channel and degrades the electron mobility. Therefore a metal, TiN act as metal work function with hafnium oxide are able to improve the channel electron mobility. Metal gate has opposite plasma oscillation and able to cancel or weakening the electric field in silicon channel and recover the mobility.

High K + metal gate device has achieved 50% Isat/Ioff ratio improvement on PMOS and 12% improvement on NMOS device compare to 65 nm process. While for gate leakage improvement, PMOS is 1000X better than 65 nm process PMOS. For threshold voltage rolloff, 45 nm technology High K + metal gate device scatter around 0.15-0.35V while 65 nm SiO2 + poly-si scatter around 0.35-0.55V. With lower threshold voltage, the device switches faster and stronger drive strength. It is general believe that high-K metal gate CMOS is the trend up to 22 nm process technology before tri-gate device with large parasitic issue can be resolved.

Tuesday, April 14, 2009

diode band diagram

Saturday, April 11, 2009

american woman know how to use mouth... you don't know how to use mouth!!

Wednesday, March 18, 2009

when you say nothing at all.......