Research at Cleveland State University

If you are a CSU student and are interested in working on one of the projects listed below, please feel free to contact me (I’m in SH434, Tel: 216-523-7480, my email address is listed at the bottom of this page).

NSF Sponsored Research

Dr. Wenbing Zhao is the lead Principal Investigator for the NSF grant CNS-0821319 ($150,000). The project is entitled "MRI: Acquisition of Equipment to Establish a Secure and Dependable Computing Infrastructure for Research and Education at Cleveland State University." As part of the project, Dr. Zhao will be investigating several key issues related to Byzantine fault tolerance for long-running, nondeterministic systems. In particular, we aim to reconcile the seemingly conflicting requirements of strong replica consistency and the independency of each individual replica, and to significantly increase the obtainable concurrency for replicated systems by using software transactional memory. We also propose a migration-based proactive recovery scheme that ensures a much reduced vulnerability window. The dedicated website for this project is http://dss.csuohio.edu/

Other Current Research Projects

Secure and Dependable Web Services

The Web services platform has been adopted by virtually all businesses and government organizations as the computing platform of choice due to its strong interoperability, loose-coupling, and extensibility design. Unlike older generations of middleware, which are based on Application Programming Interfaces (APIs), Web services are message-based (or document-based to be more accurate). This provides the best opportunity to build Web-based systems using the service-oriented architecture, i.e., developers can now focus on the services provided by the system, rather than the APIs (they have complete freedom in choosing their APIs, programming languages, tools and operating systems). The SOAP (which is based on XML) messaging protocol is the primary reason why Web services can be easily extended without breaking existing services and functionalities. SOAP was originally developed to enable remote method invocation over HTTP. Its limitation was soon realized and it has evolved to enable the document-based communication so that Web services can interact with each other by passing documents (well-structured XML files) instead of making synchronous remote method invocations. This approach enables asynchronous communication and therefore, loose-coupling between different Web services.

This new distributed computing paradigm, however, is not perfect. More security and dependability mechanisms must be developed to ensure the Web services can be trusted and reliable. Traditional fault tolerance methodologies cannot be directly applied to the Web services paradigm because most of them would break the best set of features of Web services, i.e., interoperability, loose-coupling and extensibility. We are working on a number of projects to address these issues. The following is a short list of topics that are appropriate for Masters students to work on as MS Theses:

Performance Evaluation of Reliable Multicast Strategies in 802.11 Networks 

802.11 networks have many differences comparing with wired Ethernet-based networks. Many group communication systems, which provide both reliable and ordered multicast, have been designed and optimized primarily for Ethernet-based networks. We would like to study the performance of these systems in 802.11 networks to see which strategy works best in this new environment. Furthermore, there exist many different strategies to achieve reliable multicast only. We are also interested in studying how different strategies perform over 802.11 networks. In particular, group communication systems that provide both reliable and ordered multicast are believed to incur higher overhead with respect to the reliable but non-ordered counterpart. Our preliminary results show that it may not be the case because some group communication protocol automatically reinforces medium access as a by-product of its total ordering strategy, and consequently, exhibit superior performance under heavy load.

Previous Research Projects

Unification of Replicated Object Systems and Transaction Processing Systems

This research introduces a novel software architecture that provides robust fault tolerance for networked applications within and between enterprises. The architecture resolves many of the problems currently associated with the use of transactions, and roll-backward recovery, for networked enterprise applications. In particular, it has focused on the transparent replication of applications that are built on Commercial-Off-the-Shelf (COTS) distributed transaction processing middleware and the Common Object Request Broker (CORBA) standard for distributed object computing.

Pluggable Fault-Tolerant CORBA Infrastructure

The Pluggable Fault Tolerant CORBA Infrastructure provides fault tolerance for CORBA applications by utilizing the pluggable protocols framework that is available for most CORBA ORBs. Our approach does not require modification to the CORBA ORB, and requires only minimal modifications to the application. Moreover, it avoids the difficulty of retrieving and assigning the ORB state, by incorporating the fault tolerance mechanisms into the ORB. The Pluggable Fault Tolerant CORBA Infrastructure achieves performance that is similar to, or better than, that of other Fault Tolerant CORBA systems, while providing strong replica consistency.

Performance Evaluation and Performance Engineering of Fault Tolerance Infrastructures

In this project, we carried out extensive performance analysis and measurement of the Pluggable FT CORBA infrastructure that we have design and implemented. We measure the probability density functions (pdfs) of the end-to-end latency for synchronous remove invocations. We also provide a simple performance analysis in terms of the latency values at the maximum probability densities. Our study shows that the strategies that the Totem group communication system uses have direct implications on the latency profiles, because of the logical token-passing ring that is imposed on the nodes (processors) that run the Totem instances. A token circulates around the ring and a node can broadcast a user message only when it holds the token. This strategy introduces a potential delay for each message to be sent. As a consequence, for passive and semi-active replication, the position of the primary server replica with respect to the client, together with the replica processing time, affect the end-to-end latency. To achieve the best latency, care must be taken to designate the most favorable position to run the server primary replica. However, for active replication, the replicas enter a competitive mode for sending messages and, thus, active replication is more advantageous. Depending on the pattern of the remote invocation and the server processing time, assuming other factors are constant, the send delay introduced by Totem constitutes a large portion of the replication overhead.

Fault Tolerance for Java (1998)

Java Remote Method Invocations (RMI) is a mechanism used in distributed applications over the Internet. Our goal is to provide fault tolerance for the Java RMI applications transparently with no or minimum changes to the applications. In the Aroma system that we are developing, the objects of the applications are replicated and managed by our Replication Manager. Users can specify fault-tolerance properties such as active/passive replication, minimum number of replicas, etc for these objects. The outgoing messages from the application are intercepted by a custom RMI socket factory. These messages are then directed to the local Replication Manager, which is responsible for transmitting them to the Replication Managers on which the target replicas reside through a reliable totally ordered multicast layer (Totem). The message passing between the replicated application objects and the Replication Managers is through local TCP. We have developed a prototype that is intended to support simple client/server Java RMI applications. It provides warm passive replication and active replication with or without majority voting. For state transfer, the applications have choice of either providing their own custom serialization methods or using the serialization mechanisms provided by the JDK. An auction application has been built to test/demo our prototype.

Publications

Last updated July, 2009 by Wenbing Zhao. Send your comments to: wenbingz at acm dot org.