Paper Titles from Three Prestigious
Conferences
(ISCA'96/'97, HPCA'96/'97/'98, COMPCON'96/'97)
-
Processor Microarchitecture and Design (13)
-
HPCA'96: Decoupled Vector Architectures
-
HPCA'97: Advances of the Counterflow Pipeline Microarchitecture
-
HPCA'97: Multithreaded Vector Architectures
-
HPCA'98: Non-Stalling Counter Flow Architecture
-
HPCA'98: The potential for using thread-level data speculation
to facilitate automatic parallelization
-
HPCA'98: Virtual-Physical Registers
-
ISCA'96: Evaluation of design alternatives for a multiprocessor
microprocessor
-
ISCA'96: Evaluation of multithreaded uniprocessors for commercial
application environments
-
ISCA'96: Exploiting choice: instruction fetch and issue on
an implementable simultaneous multithreading processor
-
ISCA'96: Memory bandwidth limitations of future microprocessors
-
ISCA'96: Missing the memory wall: the case for processor/memory
integration
-
ISCA'97: DataScalar architectures
-
ISCA'97: Dynamic instruction reuse
-
Instruction Level Parallelism (7)
-
HPCA'97: The Impact of Instruction-Level Parallelism on Multiprocessor
Performance and Simulation Methodology
-
HPCA'98: Treegion Scheduling for Wide Issue Processors
-
ISCA'96: High-bandwidth address translation for multiple-issue
processors
-
ISCA'96: Increasing cache port efficiency for dynamic superscalar
microprocessors
-
ISCA'97: Complexity-effective superscalar processors
-
ISCA'97: DAISY: dynamic compilation for 100% architectural
compatibility
-
ISCA'97: Improving superscalar instruction dispatch and issue
by exploiting dynamic code sequences
-
Branch Prediction (10)
-
HPCA'97: Architectural Support for Compiler-Synthesized Dynamic
Branch Prediction Strategies: Rationale and Initial Results
-
HPCA'97: Multiple Branch and Block Prediction
-
HPCA'98: Partial sampling with reverse state reconstruction:
A new technique for branch predictor performance estimation
-
ISCA'96: An analysis of dynamic branch prediction schemes
on system workloads
-
ISCA'96: Correlation and aliasing in dynamic branch predictors
-
ISCA'96: Using hybrid branch predictors to improve branch
prediction accuracy in the presence of context switches
-
ISCA'97: A language for describing predictors and its application
to automatic synthesis
-
ISCA'97: Target prediction for indirect jumps
-
ISCA'97: The agree predictor: a mechanism for reducing negative
branch history interference
-
ISCA'97: Trading conflict and capacity aliasing in conditional
branch predictors
-
Instruction Scheduling, Prefetching, and Speculation (11)
-
HPCA'96: Co-Scheduling Hardware and Software Pipelines
-
HPCA'96: Register File Design Considerations in Dynamically
Scheduled Processors
-
HPCA'96: Representative Traces for Processor Models with
Infinite Cache
-
HPCA'97: Control flow speculation in multiscalar processors
-
HPCA'98: Control Speculation in Multithreaded Processors
through Dynamic Loop Detection
-
HPCA'98: Supporting highly-speculative execution via adaptive
branch trees
-
HPCA'98: Temporal-based procedure reordering for improved
instruction cache performance
-
ISCA'96: Instruction prefetching of systems codes with layout
optimized for reduced cache misses
-
ISCA'97: Data prefetching on the HP PA-8000
-
ISCA'97: Dynamic speculation and synchronization of data
dependences
-
ISCA'97: Prefetching using Markov predictors
-
Caches (9)
-
HPCA'96: Distributed Prefetch-buffer/Cache Design for High-Performance
Memory Systems
-
HPCA'96: Predictive Sequential Associative Cache
-
HPCA'98: Speculative Versioning Cache
-
ISCA'96: Don't use the page number, but a pointer to it
-
ISCA'96: The difference-bit cache
-
ISCA'97: Designing high bandwidth on-chip caches
-
ISCA'97: Exploiting instruction level parallelism in processors
by caching scheduled groups
-
ISCA'97: Run-time adaptive cache hierarchy management via
reference analysis
-
ISCA'97: The design and analysis of a cache architecture
for texture mapping
-
Memory Architecture (11)
-
COMPCON'96: Burst and Latency Requirements Drive EDO and
BEDO DRAM Standards
-
COMPCON'96: High Bandwidth RDRAM Technology Reduces System
Cost
-
COMPCON'96: Multi-Gigabyte/sec DRAM with the MicroUnity MediaChannel
Interface
-
COMPCON'96: Synchronous DRAM Evolutionary Changes Bring Cost/Performance
Advantages in Memory Systems
-
HPCA'97: Design Issues and Trade-offs for Write Buffers
-
HPCA'97: Global Address Space, Non-Uniform Bandwidth: A Memory
System Performance Characterization of Parallel Systems
-
HPCA'97: Reducing the Replacement Overhead in Bus-Based COMA
Multiprocessors
-
HPCA'97: Software-Managed Address Translation
-
HPCA'97: Speeding up the memory hierarchy in flat-COMA multiprocessors
-
ISCA'97: Memory-system design considerations for dynamically-scheduled
processors
-
ISCA'97: The energy efficiency of IRAM architectures
-
Input / Output (6)
-
COMPCON'96: Randomized Data Allocation for Real-Time Disk
I/O
-
COMPCON'96: Redundant Arrays of Independent Libraries (RAIL):
A Tertiary Storage System
-
HPCA'98: The Architectural Costs of Streaming I/O: A Comparison
of workstations clusters and SMPs
-
ISCA'96: DCD---disk caching disk: a new approach for boosting
I/O performance
-
ISCA'96: Polling watchdog: combining polling and interrupts
for efficient message handling
-
ISCA'97: Tolerating multiple failures in RAID architectures
with optimal storage and uniform declustering
-
Interconnection Networks (20)
-
COMPCON'96: A 9.6 GigaByte/s Throughput Plesiochronous Routing
Chip
-
COMPCON'96: Overview of Memory Channel Network for PCI
-
HPCA'96: A Topology-Independent Generic Methodology for Deadlock-Free
Wormhole Routing
-
HPCA'96: Fault-Tolerance with Multimodule Routers
-
HPCA'96: Fault-Tolerant Multicast Routing in the Mesh with
No Virtual Channels
-
HPCA'96: On the Multiplexing Degree Required to Embed Permutations
in a Class of Networks with Direct Interconnects
-
HPCA'96: RMB--Reconfigurable Multiple Bus Network
-
HPCA'96: Shuffle-Ring: Overcoming the Increasing Degree of
Hypercube
-
HPCA'97: Distributed path reservation algorithms for multiplexed
all-optical interconnection networks
-
HPCA'97: Multicast on Irregular Switch-based Networks with
Wormhole Routing
-
HPCA'98: A Very Efficient Distributed Deadlock Detection
Mechanism for Wormhole Networks
-
HPCA'98: Architectural implications of a family of irregular
application
-
HPCA'98: Challenging applications on fast networks
-
HPCA'98: Credit-Flow-Controlled ATM for MP Interconnection:
the ATLAS I Single-Chip ATM Switch
-
HPCA'98: The sensitivity of communication mechanisms to bandwidth
and latency
-
ISCA'96: A router architecture for real-time point-to-point
networks
-
ISCA'96: Rotating combined queueing (RCQ): bandwidth and
latency guarantees in low-cost, high-performance networks
-
ISCA'97: Implementing multidestination worms in switch-based
parallel systems: architectural alternatives and their impact
-
ISCA'97: On deadlocks in interconnection networks
-
ISCA'97: The Mercury Interconnect Architecture: a cost-effective
infrastructure for high-performance servers
-
Network Interfaces (11)
-
COMPCON'96: SSA: A High-Performance Serial Interface for
Unparalleled Connectivity
-
HPCA'96: Protected, User-level DMA for the SHRIMP Network
Interface
-
HPCA'96: Telegraphos: High-Performance Networking for Parallel
Processing on Workstation Clusters
-
HPCA'96: Using Memory-Mapped Network Interfaces to Improve
the Performance of Distributed Shared Memory
-
HPCA'97: A Comparison of ATM and Fast Ethernet Network Interface
for User-level Communication
-
HPCA'97: Architectural Support for Reducing Communication
Overhead in Pipelined Networks
-
HPCA'97: User-Level DMA without Operating System Kernel Modification
-
HPCA'98: Address Translation Mechanisms In Network Interfaces
-
HPCA'98: Exploiting Two-Case Delivery for Fast Protected
Messaging
-
HPCA'98: The Impact of Data Transfer and Buffering Alternatives
on Network Interface Design
-
ISCA'96: Coherent network interfaces for fine-grain communication
-
Multiprocessors (29)
-
HPCA'96: A Cache Coherency Protocol for Optically Connected
Parallel Computer Systems
-
HPCA'96: A Shared-bus Control Mechanism and a Cache Coherence
Protocol for a High-Performance On-chip Multiprocessor
-
HPCA'96: Bus-Based COMA--Reducing Traffic in Shared-Bus Multiprocessors
-
HPCA'96: Distance-Adaptive Update Protocols for Scalable
Shared-Memory Multiprocessors
-
HPCA'96: Improving the Data Cache Performance of Multiprocessor
Operating Systems
-
HPCA'96: Multitasking and Multithreading on a Multiprocessor
with Virtual Shared Memory
-
HPCA'96: Parallel Intersecting Compressed Bit Vectors in
a High Speed Query Server for Processing Postal Addresses
-
HPCA'96: The Impact of Shared-Cache Clustering in Small-Scale
Shared-Memory Multiprocessors
-
HPCA'96: Two Adaptive Hybrid Cache Coherency Protocols
-
HPCA'97: An Evaluation of Fine-Grain Producer-Initiated Communication
in Cache-Coherent Multiprocessors
-
HPCA'97: On the Use and Performance of Explicit Communication
Primitives in Cache-coherent Multiprocessor Systems
-
HPCA'97: Reducing Remote Conflict Misses in Shared-Memory
Multiprocessors: NUMA with Remote Cache and COMA
-
HPCA'97: Reducing the Communication Overhead of Dynamic Applications
on Shared Memory Multiprocessors
-
HPCA'97: Software DSM Protocols that Adapt between Single
Writer and Multiple Writer
-
HPCA'97: The memory performance of DSS commercial workloads
in shared-memory multiprocessors
-
HPCA'98: Enhancing Memory Use in Simple Coma: Multiplexed
Simple Coma
-
HPCA'98: Hardware for Speculative Run-Time Parallelization
in Distributed Shared-Memory Multiprocessors
-
HPCA'98: PRISM: An Integrated Architecture for Scalable Shared
Memory
-
ISCA'96: Application and architectural bottlenecks in large
scale distributed shared memory machines
-
ISCA'96: COMA: an opportunity for building fault-tolerant
scalable shared memory multiprocessors
-
ISCA'96: Decoupled hardware support for distributed shared
memory
-
ISCA'96: MGS: a multigrain shared memory system
-
ISCA'96: Understanding application performance on shared
virtual memory systems
-
ISCA'97: Coherence controller architectures for SMP-based
CC-NUMA multiprocessors
-
ISCA'97: Efficient synchronization: let them eat QOLB
-
ISCA'97: Hardware fault containment in scalable shared-memory
multiprocessors
-
ISCA'97: Reactive NUMA: a design for unifying S-COMA and
CC-NUMA
-
ISCA'97: The interaction of software prefetching with ILP
processors in shared-memory systems
-
ISCA'97: VM-based shared memory on low-latency, remote-memory-access
networks
-
Multicomputers, Clusters, and Network of Workstations (13)
-
HPCA'97: Evaluating MPI collective communication on the SP2,
T3D, and Paragon Multicomputers
-
HPCA'97: Message Proxies for Efficient, Protected Communication
on SMP Clusters
-
HPCA'97: Scheduling Communication on a SMP Node Parallel
Machine
-
ISCA'97: Effects of communication latency, overhead, and
bandwidth in a cluster architecture
-
HPCA'96: A Comparison of Entry Consistency and Lazy Release
Consistency Implementations
-
HPCA'96: Improving Release-Consistent Shared Virtual Memory
Using Automatic Update
-
HPCA'96: Performance Evaluation of a Cluster-Based Multiprocessor
Built from ATM Switches and Bus-Based Multiprocessor Servers
-
HPCA'98: Comparative Evaluation of Latency Tolerance Techniques
for Software Distributed Shared Memory
-
HPCA'98: Efficiently Adapting to Sharing Patterns in Software
DSMs
-
HPCA'98: Fine-grain Software Distributed Shared memory on
SMP clusters
-
HPCA'98: Home-based SVM protocols for SMP clusters: Design
and Performance
-
HPCA'98: The Effectiveness of SRAM Network caches in Clustered
DSM's
-
HPCA'98: Using multicast and multithreading to reduce communication
in software DSM systems
-
Performance Evaluation (12)
-
COMPCON'96: Performance Comparison of MPEG1 and MPEG2 Video
Compression Standards
-
COMPCON'97: Performance Implications of Next Generation PowerPC(tm)
Microprocessor Cache Architectures
-
HPCA'96: Performance Study of a Multithreaded Superscalar
Microprocessor
-
HPCA'97: A Framework for Statistical Modeling of Superscalar
Processor Performance
-
HPCA'97: A Performance Comparison of Hierarchical Ring- and
Mesh-Connected Multiprocessor Networks
-
HPCA'97: Performance Characterization of the Pentium Pro
Processor
-
HPCA'97: Towards a Communication Characterization Methodology
for Parallel Applications
-
HPCA'98: Performance evaluation of tiling for the register
level
-
HPCA'98: Performance study of a Concurrent Multithreaded
Processor
-
ISCA'96: Compiler and hardware support for cache coherence
in large-scale multiprocessors: design considerations and performance study
-
ISCA'96: Informing memory operations: providing memory performance
feedback in modern processors
-
ISCA'96: Performance comparison of ILP machines with cycle
time evaluation
-
Microprocessors (16)
-
COMPCON'96: 64-bit and Multimedia Extensions in the PA-RISC
2.0 Architecture
-
COMPCON'96: ARM7100 -- A High-Integration, Low-Power Microcontroller
for PDA Applications
-
COMPCON'96: An Overview of the PentiumRPro Processor Bus
-
COMPCON'96: Design of the PowerPC 604e Microprocessor
-
COMPCON'96: Multiprocessor Validation of the PentiumRPro
Microprocessor
-
COMPCON'96: PA7300LC Integrates Cache for Cost/Performance
-
COMPCON'96: StrongARM: A High-Performance ARM Processor
-
COMPCON'96: Thumb: Reducing the Cost of 32-bit RISC Performance
in Portable and Consumer Applications
-
COMPCON'96: UltraSPARC-II The Advancement of UltraComputing
-
COMPCON'96: UltraSPARC Compiling for Maximum Floating-Point
Performance
-
COMPCON'97: Compiler Optimizations for the PA-8000
-
COMPCON'97: Functional Verification of the Superscalar SH-4
Microprocessor
-
COMPCON'97: Intel's Multimedia Architecture Extension
-
COMPCON'97: The Alpha 21164PC Microprocessor
-
COMPCON'97: The Alpha 21264: A 500 MHz Out-of-Order Execution
Microprocessor
-
HPCA'96: Performance Characterization of the Alpha 21164
Microprocessor Using TP and SPEC Workloads
-
Media Processors and Digital Signal Processors (8)
-
COMPCON'96: A Scalable Chip Set for MPEG2 Real-Time Encoding
-
COMPCON'96: An Architectural Overview of the Programmable
Multimedia Processor, TM-1
-
COMPCON'96: Architecture of a Broadband MediaProcessor
-
COMPCON'96: Broadband Algorithms with the MicroUnity Mediaprocessor
-
COMPCON'96: Mediaprocessing in the Compressed Domain
-
COMPCON'96: The Mpact Media Processor Redefines the Multimedia
PC
-
HPCA'97: Datapath Design for a VLIW Video Signal Processor
-
HPCA'98: FPGA based custom computing machines for irregular
problems
-
Systems (11)
-
COMPCON'96: Digital's Clusters and Scientific Parallel Applications
-
COMPCON'96: Mid-Range and High-End PA-RISC Computer Systems
-
COMPCON'96: Overview of Digital UNIX Cluster System Architecture
-
COMPCON'96: PentiumRPro Processor Workstation/Server PCI
Chipset
-
COMPCON'96: PowerPC Platform: A System Architecture
-
COMPCON'96: The Performance and PowerPC Platform Specification
Implementation of the MPC106 Chipset
-
COMPCON'97: The Evolution of the HP/Convex Exemplar
-
HPCA'98: Communication Across Fault-Containment Firewalls
on the SGI Origin
-
ISCA'96: Early experience with message-passing on the SHRIMP
multicomputer
-
ISCA'96: STiNG: a CC-NUMA computer system for the commercial
marketplace
-
ISCA'97: The SGI origin: a ccNUMA highly scalable server