# SIKS of Unified Computer Science TR Index

## Entries matching at least 3 keywords found: 17

### Matching 4 keywords:

U of California Berkeley, CS csd-86-291(dir)
Hamachi, Gordon Taro. An Obstacle-Avoiding Router for Custom VLSI.'' UCB//CSD-86-291, 130 pages.
ABSTRACT
Magic's automatic routing system combines the flexibility of hand-routing with the speed and quality of automatic channel routers, by allowing chip designers to prewire selected nets by hand. The router then works around this previously-placed layout, called obstacles, to automatically wire the remaining nets. This ability to partially hand-route an integrated circuit gives designers complete control over critical paths, power and ground routing, and other special netw. At the same time the router provides a fast way to make the remaining connections in the design.

The system's novel features include a fast channel decomposer, an obstacle-avoiding global router, and an obstacle-avoiding switching router. The router's channel decomposition algorithm relies on a corner-stitched data structure to efficiently produce a small number of large channels. The global router considers obstacles during path generation, trading-off net length and channel complexity to simplify the subsequent channel routing task. While able to cope with obstacles, Magic's switchbox router is still comparable to the best traditional (non-obstacle-avoiding) channel routers.

The router's obstacle-avoidance features rely on two underlying concepts: (1) a preferred direction for crossing an obstacle, and (2) hazards, or areas the routing should avoid. Crossing obstacles in the preferred direction minimizes the creation of blocked areas, which can not be crossed by other routing; this minimizes obstacles' impact on the automatic routing. Crossings in preferred directions are controlled by strategically-placed hazards adjacent to obstacles.

Measurements show that obstacle-avoiding routing is both useful and practical: hand-routing improves the electrical characteristics of the selected nets, while the hand-routing obstacles have only minor effects upon the routing quality of a design as a whole. The improvement in electrical characteristics is due to the decreased net length and increased attention to layer selection possible when nets are prewired by hand.

ENTRY
June 25, 1993
RETRIEVAL
ocr (in all.ocr); tiff (in {001-130}.tif)

U of California Berkeley, ERL erl-90-27(dir)
Jackson, Michael A.B.; Srinivasan, Arvind; Kuh, Ernest S. A Novel Approach to IC Performance Optimization by Clock Routing.'' UCB//ERL-90-27, April, 1990. 34 pages.
ABSTRACT
In today's highly competitive IC marketplace, company survival necessitates product differentiability. Product differentiability may be engendered in many ways, several of which include: increased performance (e.g. lower power, faster timing, etc...), lower cost, more features, or faster time to market. Thus, the motivation for design techniques that enhance chip timing performance are of fundamental importance to the IC community.

The clock is the essence of a synchronous digital system. Physically, the clock is distributed from an external pad to all similarly clocked synchronizing elements through a distribution network that encompasses combinational logic and interconnects. It serves to unify the physical and temporal design abstractions by determining the precise instants in time at which the digital machine changes state. Because the clock is important, optimization of the clock signal can have a significant impact on the chip's cycle time, especially in high-performance designs. Non-optimal clock behavior is caused by two phenomena: the routing to the chip's synchronizing elements, and the asymmetric behavior of the clock distribution logic.

Previous work in clock optimization has been contributed by several authors. H-trees have been recognized for years as a technique to help reduce the skew in synchronous systems [FK82] [KGE82] [DFW84] [BWM86]. For regular structures such as systolic arrays the H-tree works well to reduce skew because the synchronizing elements are distributed in a uniform pattern throughout the chip. However, for general design styles, nonuniform distributions of clock pins are common and the H-tree becomes ineffective as a technique for clock routing. The large size of the clock net has led some researchers [DFW84] [Mij87] to perform buffer optimization within the clock distribution tree. [BWM86] have provided an analysis of the clock tree that considers the transmission line properties of the interconnects. [BBB +89] have presented an approach for ASIC clock distribution that integrates buffer optimization into place and route algorithms. However, in all previous work the routing of the clock net is performed using ordinary global routing tools based on minimum spanning or approximate minimum Steiner tree net models and with detailed routers that have little understanding of clock routing problems. This causes non-optimal clock behavior and as region size or the number of pins in the clock net increases, the detrimental behavior is exacerbated. In this paper, we focus on routing techniques for optimizing clock signals in VLSI circuits. We demonstrate the superiority of our algorithm over standard routing techniques for widely varying region size, clock pin distributions, numbers of clock pins and technology feature size.

In section two the preliminaries necessary for understanding the paper are presented. Following this, in section three the problem is defined. Section four illustrates the algorithm for clock routing and section five discusses theoretical results. Next, in section six the experimental results are presented, and in section seven possible avenues for future work and conclusions regarding the approach are discussed.

ENTRY
Sep 14, 1994
RETRIEVAL
ocr (in all.ocr); tiff (in {001-034}.tif)

U of California Berkeley, CS csd-83-137(dir)
Fujimoto, Richard M. VLSI Communication Components for Multicomputer Networks.'' UCB//CSD-83-137, 234 pages.
ABSTRACT
Advances in microporecessor technology will soon make general purpose computing systems composed of thousands of VLSI processors economically feasible. A high-performance communication system to interconnect these processors is of crucial importance to exploit the parallelism inherent in application such as circuit simulation and signal processing. This thesis discusses issues in the design of universal VLSI communication components to be used as the building blocks for constructing robust, high-bandwidth, point-to-point networks. The component provide enough flexibility to serve a wide variety of multicomputer configurations and applications. They feature special purpose hardware to implement communication functions traditionally implemented with network software.

A communication network constructed from the proposed components is modeled as a set of nodes (components) connected by bidirectional communication links. Because of technological constraints, the total I/O bandwidth of each node is limited to some fixed value, and assumed to be equally divided among the attached links. Increasing the number of links per component leads to a reduction in the average number of hops between nodes, but at the cost of reduced link bandwidth. This "hop count/ link bandwidth" tradeoff is examined in great detail through M/M/1 queueing models and simulations using traffic loads generated by parallel application programs. These results indicate that a small number of links should be used. It is also found that a significant improvement in performance is obtained if a component is allowed to immediately begin forwarding a message when the selected output link becomes idle, regardless of whether or not the end of the message has arrived. Finally, mechanisms which efficiently transmit a single message to multiple destinations are seen to have a significant impact on performance in programs relying on global information.

The complexity of the circuitry required to implement a communication component is examined. Schemes for providing hardware support for communication functions--routing, buffer management, and flow control--are presented. Estimates of the number of buffers and the degree of multiplexing on each communication link are determined. The amount of circuitry to implement a communication component is computed, and it is seen that the proposed communication component could be complemented with technology available today. Design recommendations for the implementation of such a component are made.

ENTRY
July 9, 1993
RETRIEVAL
ocr (in all.ocr); tiff (in {001-234}.tif)

Inst for CS-FORTH 1991.TR25.Fast_packet_switches.ps.Z(254K)
Stefanos Sidiropoylos. Fast Packet Switches for Asynchronous Transfer Mode.'' Technical Report No. 25, Masters thesis, CSD-UCH. ICS-FORTH, Heraklion, Crete, Greece, August 1991.
Fast Packet Switches for Asynchronous Transfer Mode will form the basis for implementing the Broadband Intergated Services Digital Network of the future. This thesis presents the preliminary design of a switch chip that implements a novel flow control algorithm guaranteeing fair allocation and full utilization of bandwidth even under congestion. The switch chip is intented to be connected via its bit parallel links to other identical switch chip and various interface chips, so that it can be used as a building block for constructing composite switches of arbitrary topology and size. Its architecture is scalable up to a dozen multi-Gbit/sec links, although the first prototype is designed for only 4 bidirectional links of 400 Mbits/sec per link in each direction. The organization of the on-chip buffer memory, along with the back-pressure flow control, and the weighted round robin cell scheduling mechanisms that the chip implements in hardware, provide the network manager a set of powerful tools for tuning traffic flows and guaranteeing service performance. Full bandwidth utilization is achieved by providing dedicated buffer space to some classes of Virtual Circuits, and communication latency is reduced by using virtual cut-through''.

The realization of this architecture under a 1\ \(*mm CMOS technology has been studied. We present the circuits that are crucial for the size and speed of the chip. These crucial circuits are primarily: the buffer memory, the input-output buffers, the content addressable memory used to select the next VC to be serviced, the circuit that schedules the accesses to the shared buffer, and the chip routing tables. We present the design and layout of the above circuits and the results of their simulation as a proof of the fact that the proposed architecture is realizable using the modern VLSI technology

1991.TR25.Fast_packet_switches.ps.Z

### Matching 3 keywords:

U of California Berkeley, ERL erl-93-42(dir)
Bhaqt, Narasimha B.; Chaudary, Kamal; Kuh, Ernest S. Performance-Oriented Fully Routable Dynamic Architecture for a Field Programmable Logic Device.'' UCB//ERL-93-42, 42 pages.
ABSTRACT
In this report we present a new architecture for a Field Programmable Logic Device. The architecture is geared towards routing completion and predictable timing performance. The central principle of the new architecture is based on the concept of efficient use of silicon resources. It is performance-oriented, with predictable interconnect and logic delays, and has a guaranteed routability. Latency in the original circuit is exploited in such a manner as to make efficient reuse of interconnect resources. Specifically, the given circuit is topologically levelized and implemented in a folded-pipeline manner. The new architecture is CAD friendly, thereby eliminating the need for complex tiem-consuming place and route tools.
ENTRY
November 30, 1993
RETRIEVAL
ocr (in all.ocr); tiff (in {001-042}.tif)

U of California Berkeley, ERL erl-90-17(dir)
Lee, Brian D.N. Experiments in Hierarchical Routing of General Areas.'' UCB//ERL-90-17, March, 1990. 132 pages.
ABSTRACT
The complexity and size of integrated circuit designs has increased dramatically in recent years. Accordingly, there is a need for design automation tools that can handle ever larger and more complex routing problems. The problems of interest have evolved from simple channels to switchboxes to general areas. Currently, there is great interest in solving general area routing problems because of the rising popularity of new design methodologies and the emergence of new manufacturing technologies. For example, the Sea-of-Gates design style with many layers of high-quality interconnect requires over-the-cell routing to achieve high circuit densities. The structure of such routing problems on the channelless gate array is clearly in the domain of general area routers. Similarly, the next generation PC board technology using silicon-on-silicon modules will also require a general area router. Because these complex routing problems are a manifestation of an overall increase in design complexity, routers must not only handle these general area problems but also interact with other design tools in a more sophisticated manner. In particular, the feedback between routing programs and placement programs must become more explicit and any approach developed for general area problems must take this interaction into consideration.

In this report, hierarchical decomposition is examined as a way to attack the general area routing problem. This divide and conquer method is a useful technique for managing complexity and in recent years has been applied to various routing problems. In particular, there are examples of this concept in channel routers [BP82, BP83a, HM85, HM89], switchbox routers [BP83b], and global routers [M584, LTW86, M586, Lau87]. This work describes a set of experiments that outlines the extension of these applications to general area routing problems.

ENTRY
Aug 31, 1994
RETRIEVAL
ocr (in all.ocr); tiff (in {001-132}.tif)

U of California Berkeley, CS csd-88-412(dir)
Srini, Vason P.; Tam, Jerric V.; Nguyen, Tam M.; Holmer, Bruce K.; Patt, Yale N.; Despain, Alvin M. Design and Implementation of A CMOS Chip for Prolog.'' UCB//CSD-88-412, 132 pages.
ABSTRACT
We have designed and fabricated a high performance VLSI chip for executing Prolog programs using a 1.4 micron CMOS technology with two layers of metal. This chip implements a tagged architecture with hardware support for five stacks.The 32-bit data path of the chip contains a fast ALU, 64 registers in four groups, five counters, and six non-master/slave registers. The control is microprogrammed and uses a 512 X 160 bit ROM with four pages for fast microbranching. The chip operates at a cycle time of 100 ns (worst case) and has a size of 10 mm X 9 mm. A semicustom design methodology employing Mentor and NCR tools has been used in design. The challenges involved in the design, verification, routing, and fabrication of the chip are described.
ENTRY
October 11, 1993
RETRIEVAL
ocr (in all.ocr); tiff (in {001-132}.tif)

U of California Berkeley, CS csd-87-312(dir)
Raghavan, Prabhakar. Randomized Rounding And Discrete Ham-Sandwich Theorems: Provably Good Algorithms For Routing And Packing Problems.'' UCB//CSD-87-312, July 1986. 73 pages.
ABSTRACT
This thesis deals wiht the approximate solution of a class of zer0-one integer programs arising in the design of integrated circuits, in operations research, and in some combinatorial problems. Our approach consists of first relaxing the integer program to a linear program, which can be solved by efficient algorithms. The linear program solution may assign fractional values to some of the variables, and these values are 'rounded' to obtain a provably good approximation to the original integer program.

We first consider the problem of global routing in gate-arrays. This problem has important applications in the design of integrated circuits, and can be formulated as a zero-one integer program. We introduce a technique we call randomized rounding for producing a provably good approximation to this integer program from the solution to its relaxation. In order to prove the quality of this approximation, we make use of some new bounds on the tail of the binomial distribution. We present the results of experiments conducted on industrial gate=arrays using our methods; these are encouraging and call for further work.

We then show that our randomized rounding technique can be applied to some problems in combinatorial optimization and operations research. We also describe the relation between the problems we study and a class of combinatorialresults known as "discrete ham-sandwich theorems". This leads to the problem of rounding linear program solutions deterministically mimic the randomized algorithm in a certain precise sense. This leads us to the development of a deterministic polynomial time rounding algorithm that yields the same performance guarantees as the random ized method.

ENTRY
March 25, 1993
RETRIEVAL
ocr (in all.ocr); tiff (in {001-073}.tif)

U of California Berkeley, CS csd-92-689(dir)
Tzeng, Ping-San. Integrated Placement and Routing for VLSI Layout Synthesis and Optimization.'' UCB//CSD-92-689, 106 pages.
ABSTRACT
This dissertation investigates ways to integrate various VLSI layout algorithms via carefully designed integrated data structures. Such an integrated approach can achieve better overall results by iterating non-sequentially among the various algorithms in a demand-driven manner. The shared data strucure which is modified incrementally by all the different algorithms serves as an efficient communication medium between them This approach has resulted in several new prototype tools, including a new placement program that combines wire-length optimization with a new 2-D compaction algorithm, a new area-routing approach that employs hierarchical rip-up and reroute techniques in an integrated global and detailed routing environment, and also a system that integrates the area router with a placement adjustment algorithm. This integrated system can iterate automatically between area routing and placement adjustment phases to generate optimized results for macro-cell problems with over-the-cell routing.
ENTRY
August 6, 1993
RETRIEVAL
ocr (in all.ocr); tiff (in {001-106}.tif)

U of California Berkeley, CS csd-95-868(dir)
Adams, Glenn David. Non-Sequential Tool Interaction Strategies for Sea-of-Gates Layout Synthesis.'' UCB//CSD-95-868, December 1994. 142 pages.
ABSTRACT
This research focuses on strategies for managing the interaction among tools for the automated VLSI layout synthesis of regular macro-modules, such as bit-slice datapaths. Compact layouts for such macro-modules may be constructed by including the module's external wiring in the leaf cell layout. However, in most automated layout systems, module and leaf cell layouts are performed by separate non-interacting tools. Leaf cell layouts are synthesized in advance, stored in a library, and then customized to fit the placement and wiring of a particular module via stretching and/or wiring personalization. This approach limits the possible customizations for a cell and the possible layouts for a module, particularly when the use of pre-fabricated transistor arrays (as in Sea-of-Gates) precludes the stretching of cells.

To overcome this limitation, I have developed a model for macro-module layout that employs a high degree of tool interaction. The module floor planner decides the relative placement and shape of function blocks that minimizes external wiring while maximizing the amount of over the cell routing. Cell layouts are provided on demand by the SoGOLaR automatic cell generator; their quality and feasibility are, in turn, dependent on the constraints imposed by the external wiring. This model can easily accommodate modules with non-uniform function blocks.

Inter-tool communication is handled by maintaining a database of layouts based on a data structure that contains both the layout state and associated metrics. The database handles all layout requests, and maintains a history of both successful and unsuccessful synthesis attempts, all while imposing few restrictions on tool interaction.

A separate failure handler is employed to remedy situations where the normal tool interaction sequence produces a floor plan that cannot be routed or no longer meets specified constraints. The handler will undo portions of the layout and re-invoke the synthesis tools, altering their control parameters and constraints to guide them toward a more promising solution. Repetition is prevented by restricting constraint alterations and by checking the layout database for previous failed layout attempts. Remedies are controlled by a greedy strategy that is significantly less complex than general backtracking methods.

ENTRY
April 7, 1995
RETRIEVAL
postscript (in all.ps)

Simon Fraser U, SCS A(dir) A-(dir) AI(dir) AIM(dir) B(dir) CITI(dir) CMPT-TR(dir) CMPT94-11.ps.Z(107K) CMU-CMT(dir) CMU-CS(dir) CMU-HCII(dir) CMU-RI(dir) COGSCI(dir) CS(dir) CS-(dir) CS-TR(dir) CSD(dir) CSE-TR(dir) CSRI(dir) DOCIDS(0K) ERL(dir) GIT-CC(dir) GIT-GVU(dir) IR(dir) LIST(21K) LISTING(2K) MCS-P(dir) MEDG(dir) MIT-LCS(dir) PCS-TR(dir) RAMP(dir) SEI(dir) TNS(dir) TR(dir) UIUCDCS-R(dir) sfu.ca(dir) tr(dir)

Simon Fraser U, SCS A(dir) A-(dir) AI(dir) AIM(dir) B(dir) CITI(dir) CMPT-TR(dir) CMPT93-04.ps.Z(121K) CMU-CMT(dir) CMU-CS(dir) CMU-HCII(dir) CMU-RI(dir) COGSCI(dir) CS(dir) CS-(dir) CS-TR(dir) CSD(dir) CSE-TR(dir) CSRI(dir) DOCIDS(0K) ERL(dir) GIT-CC(dir) GIT-GVU(dir) IR(dir) LIST(21K) LISTING(2K) MCS-P(dir) MEDG(dir) MIT-LCS(dir) PCS-TR(dir) RAMP(dir) SEI(dir) TNS(dir) TR(dir) UIUCDCS-R(dir) tr(dir)
TechReport{U-SFraser-CMPT-TR:1993-04, key = "PetSys93", author = "Joseph G. Peters and Michel Syska", title = "Circuit-Switched Broadcasting in Torus Networks", org = "SFU-CMPT", number = "93-04", year = "1993", month = may, abstract = "In this paper we present two broadcasting algorithms for 2-dimensional torus networks (wrap-around meshes) that use synchronous circuit-switched routing. The first algorithm is based on a recursive tiling of a torus and requires $\lceil log_5 (n^2) \rceil$ phases and $2 \lfloor \frac{n}{2} \rfloor$ intermediate switch settings to broadcast in an $n \times n$ torus. Both of these quantities match the lower bounds. The second algorithm combines circuit-switching with the pipelining and arc-disjoint spanning trees techniques that are commonly used to speed up store-and-forward routing. For long messages of length $L$, the total transmission time is $\frac{L \tau}{2}$ where $1 / \tau$ is the bandwidth of the communication links. This is within a factor of 2 of the lower bound.", institution = SFU_CS_School, }

U of California Berkeley, CS csd-83-154(dir)
Ousterhout, John K.; Hamachi, Gordon T.; Mayo, Robert Nelson; Scott, Walter Stewart; Taylor, George S. A Collection of Papers on Magic.'' UCB//CSD-83-154, December 2, 1983. 104 pages.
ABSTRACT
Magic is a "smart" layout system for integrated circuits. It incorporates expertise about design rules and connectivity directly into the layout system in order to implement powerful new operations, including: a continuous design-rule checker that operates in background to maintain an up-to-date picture of violations; an operation called plowing that permits interactive stretching and compaction; and routing tools that can work under and around existing connections in the channels. Magic uses a new data structure called corner stitching to achieve an efficient implementation of these operations.
ENTRY
July 3, 1993
RETRIEVAL
ocr (in all.ocr); tiff (in {001-104}.tif)

U of California Berkeley, ERL erl-93-80(dir)
Bhat, Narasimha B. Novel Techniques for High Performance Field Programmable Logic Devices.'' UCB//ERL-93-80, November 1993. 232 pages.
ABSTRACT
Field programmable logic devices (FPLDs) are fast emerging as viable alternatives to mask programmed parts because of their rapid time-to-market and low costs. Their application has, however, been limited to implementing random logic, with non-critical timing specifications. This work attempts to advance FPLD usage to high-performance applications as well. A two-pronged approach is adopted. In the first part, CAD algorithms aimed at improving routability and performance of designs mapped to existing look-up table (LUT) based field programmable gate arrays (FPGAs) are developed. The concept of a two-input LUT primitive cell is introduced. This reduces the number of library patterns that require to be stored for an LUT library, and makes it feasible to extend performance-driven library based technology mapping techniques to LUT FPGAs. The performance- driven mapping algorithm accounts for interconnect delay and provides area-delay trade offs. Experiments on benchmark designs show the effectiveness of the new algorithms. In the second part, a new FPLD architecture is introduced. The architecture is based on the concept of time-sharing of logic and routing resources in an effort to have a fully routable, CAD friendly FPLD with predictable timing performance and efficient silicon usage. Real-time reconfiguration of logic and routing resources implements a given circuit in a folded pipe-line fashion, pipe lining at the gate level. Several possible variations of the basic architecture are discussed. A simple synthesis scheme is developed and experimental results are reported. Area and timing analyses demonstrates advantages over existing FPGAs.
ENTRY
June 9, 1994
RETRIEVAL
ocr (in all.ocr); tiff (in {001-232}.tif)

U of California Berkeley, ERL erl-93-81(dir)
Lee, Brian Douglas Ngai. COMBINED HIERARCHICAL APPROACHES TO INTEGRATED CIRCUIT LAYOUT BASEd ON A COMMON DATA MODEL.'' UCB//ERL-93-81, November 1993. 110 pages.
ABSTRACT
The integrated circuit layout problem is the specification of fabrication patterns that implement a given circuit description subject to manufacturing and performance constraints. The high complexity of integrated circuits has made it necessary to decompose the layout problem into a set of subproblems, and the typical layout process usually solves a sequence of subproblems corresponding to the steps of floor planning, placement, channel definition, global routing, detailed routing, and compaction. Each subproblem depends on the solutions of the previous subproblems and iteration over the sequence of problems may be required to find a feasible layout solution. Unfortunately, iterative improvement of layout solutions is very difficult because a side effect of the partitioning is that feedback between subproblems is hard and not well understood.

A merged hierarichal approach to layout presented that provides a means of investigating the relationship between placement and routing to improve current layout methods. A common data model is used to integrate layout phases and to simplify and enhance information flow within the layout process. The data model induces a structure to the layout process for experimenting with feedback and information management.

A layout system based on a 2x2 grid-graph abstraction that combines placement and global routing has been implemented based on this paradigm. Examples from sea-of-gates designs are used for benchmark comparisons.

ENTRY
June 10, 1994
RETRIEVAL
ocr (in all.ocr); tiff (in {001-110}.tif)

U of Maryland, College Park CS 3360(dir) 3360.ps.Z(105K)
Cengiz Alaettinoglu, Ibrahim Matta, A. Udaya Shankar. A Scalable Virtual Circuit Routing Scheme for ATM Networks.'' CS-TR-3360, University of Maryland Institute for Advanced Computer Studies

Monash U, Robotics & Digital Tech 93-15.ps.Z(93K)
Title: The Caroline Project: Service and Protocol Description Author: J. W. Breen Technical Report 93-15 Availability: ftp.rdt.monash.edu.au:rdt/techreports/RDT/93-15.ps.Z Date: February 18, 1994 Number of pages: 28 Keywords: packet switch, network, switching fabric, flooding, service, protocol Abstract: The Caroline Network is an experimental packet switch network being developed in the Department of Robotics \& Digital Technology at Monash University to experiment with switching fabrics and packet protocols. Among the features of the Caroline network are: a bus-based switching fabric, with fixed-length packets travelling between switching nodes on Virtual Circuits; a very light-weight, but robust packet protocol; a routing system for packets which uses a series of virtual circuits between switching nodes. A flooding algorithm is used to determine the optimal inter-node path at the time of the establishment of a Virtual Circuit. The Caroline network provides a connection-oriented unreliable packet delivery service, capable of efficiently supporting a wide range of user services, including those of continuous and variable bit-rates. This paper defines the services of the Caroline network and the inital packet protocol. The two sections can stand alone as the respective service and protocol standards" of the Caroline network.