Research at Cleveland State University
If you are a CSU student and are interested in working on one of the
projects listed below, please feel
free to contact me (I’m in SH434, Tel: 216-523-7480, my email
address
is listed at the bottom of this page).
NSF Sponsored Research
Dr. Wenbing Zhao is the lead Principal
Investigator for the NSF grant CNS-0821319 ($150,000). The project is
entitled "MRI:
Acquisition of Equipment to Establish a Secure and Dependable Computing
Infrastructure for Research and Education at Cleveland State
University." As part of the project, Dr. Zhao will be
investigating several key issues related to Byzantine fault tolerance
for long-running, nondeterministic systems. In particular, we aim to
reconcile the seemingly conflicting requirements of strong replica
consistency and
the independency of each individual replica, and to significantly
increase the
obtainable concurrency for replicated systems by using software
transactional
memory. We also propose a migration-based proactive recovery scheme
that
ensures a much reduced vulnerability window. The dedicated website for
this project is http://dss.csuohio.edu/
Other Current Research
Projects
Secure and Dependable Web
Services
The Web services platform has been adopted by
virtually all businesses and government organizations as the computing
platform of choice due to its strong interoperability, loose-coupling,
and
extensibility
design. Unlike older generations of middleware, which are based on
Application
Programming Interfaces (APIs), Web services are message-based (or
document-based to be more accurate). This provides the best opportunity
to
build Web-based systems using the service-oriented architecture, i.e.,
developers can now focus on the services provided by the system, rather
than
the APIs (they have complete freedom in choosing their APIs,
programming
languages, tools and operating systems). The SOAP (which is based on
XML)
messaging protocol is the primary reason why Web services can be easily
extended without breaking existing services and functionalities. SOAP
was
originally developed to enable remote method invocation over HTTP. Its
limitation was soon realized and it has evolved to enable the
document-based
communication so that Web services can interact with each other by
passing
documents (well-structured XML files) instead of making synchronous
remote
method invocations. This approach enables asynchronous communication
and
therefore, loose-coupling between different Web services.
This new distributed computing paradigm,
however, is not perfect. More security and dependability mechanisms
must be developed to ensure the Web services can be trusted and
reliable.
Traditional fault tolerance methodologies cannot be directly applied to
the Web
services paradigm because most of them would break the best set of
features of
Web services, i.e., interoperability, loose-coupling and extensibility.
We
are working on a number of projects to address these issues. The
following
is a short list of topics that are appropriate for Masters students to
work on as MS Theses:
- Reliable ordered message
multicast for Web services. The project will
start by
extending the Apache Sandesha framework. Sandesha is an open source
project
implementing the WS-ReliableMessaging specification.
- Byzantine fault tolerant
systems for Web services. This project will
start by
porting the MIT BFT system from C++ to Java, and redesign the
architecture so that
it fits the Web services architecture.
- Reservation-based extended
transactions for Web services. Web services
will be used by
many businesses to perform automated business-to-business (B2B)
transactions. Quite often, these businesses are geographically far
apart and there is no
need for all participants to reach the same decision. Atomic
transactions are
not useful for this type of business activities. Other type of extended
transaction models, while useful to some degree, involve with the use
of
compensation transactions, which are difficult to program and prone to
mistakes that
require manual intervention. We have proposed a reservation-based
protocol for
extended transactions that enables the automation of loose-coupled,
long running
business activities without resorting to compensation transactions.
This project will be built on top of an open source WS-Coordination
implementation
(Apache Kandula project).
- Byzantine fault tolerant
coordination for Web
services atomic transactions. The
bulk of business applications involve with transaction processing and
require
high degree of security and dependability. We have seen more and more
such
applications being deployed over the Internet, driven by the need for
business integration and collaboration, and enabled by the latest
service-oriented computing techniques such as Web services. This
requires the
development of a new generation of transaction processing (TP)
monitors, not only due to
the new computing paradigm, but because of the untrusted operating
environment
as well. This work is an investigation of the issues and challenges of
building
a Byzantine fault tolerant (BFT) TP monitor for Web services, which
constitutes the major contribution of this paper. We focus on the Web
services
atomic transaction pecification (WS-AT). The core services specified in
WS-AT
are replicated and protected with BFT mechanisms. The BFT algorithm is
adapted for the replicas to achieve Byzantine agreement. We emphasize
that the
resulting BFT TP monitor framework is not a trivial integration of
WS-AT and the
BFT algorithm. We have proposed a number of novel mechanisms to achieve
BFT
with minimum overhead in the context of distributed transactions
coordination, and the experimental evaluation of a working prototype
proves the
optimality of our mechanisms and their implementations.
- Byzantine fault tolerant
coordination for Web services business
activities.
- Integrating replication and
threshold cryptography for stateful Web
services.
Performance Evaluation of
Reliable Multicast Strategies in 802.11
Networks
802.11 networks have many differences comparing with wired
Ethernet-based networks. Many group communication systems,
which provide both reliable and ordered multicast, have been designed
and
optimized primarily for Ethernet-based networks. We would like to study
the
performance of these systems in 802.11 networks to see which strategy
works best in
this new environment. Furthermore, there exist many different
strategies to
achieve reliable multicast only. We are also interested in studying how
different strategies perform over 802.11 networks. In particular, group
communication systems that provide both reliable and ordered multicast
are believed
to incur higher overhead with respect to the reliable but non-ordered
counterpart. Our preliminary results show that it may not be the case
because some group
communication protocol automatically reinforces medium access as a
by-product of its total ordering strategy, and consequently, exhibit
superior
performance under heavy load.
Previous Research Projects
Unification of Replicated
Object Systems and Transaction Processing
Systems
This research introduces a novel software
architecture that provides robust fault tolerance for networked
applications within and between enterprises. The architecture resolves
many of the
problems currently associated with the use of transactions, and
roll-backward
recovery, for networked enterprise applications. In particular, it has
focused on
the transparent replication of applications that are built on
Commercial-Off-the-Shelf (COTS) distributed transaction processing
middleware and the Common Object Request Broker (CORBA) standard for
distributed
object computing.
Pluggable Fault-Tolerant CORBA
Infrastructure
The Pluggable Fault Tolerant CORBA
Infrastructure provides fault tolerance for CORBA applications by
utilizing the pluggable protocols framework that is available for most
CORBA ORBs.
Our approach does not require modification to the CORBA ORB, and
requires
only minimal modifications to the application. Moreover, it avoids the
difficulty of retrieving and assigning the ORB state, by incorporating
the fault
tolerance mechanisms into the ORB. The Pluggable Fault Tolerant CORBA
Infrastructure achieves performance that is similar to, or better than,
that of other
Fault Tolerant CORBA systems, while providing strong replica
consistency.
Performance Evaluation and
Performance Engineering of
Fault Tolerance Infrastructures
In this project, we carried out extensive performance
analysis and measurement of the Pluggable FT CORBA infrastructure that
we have design and implemented. We measure the probability density
functions
(pdfs) of the end-to-end latency for synchronous remove invocations. We
also
provide a simple performance analysis in terms of the latency values at
the
maximum probability densities. Our study shows that the strategies that
the
Totem group communication system uses have direct implications on the
latency
profiles, because of the logical token-passing ring that is imposed on
the nodes
(processors) that run the Totem instances. A token circulates around
the ring and a node can broadcast a user message only when it holds the
token.
This strategy introduces a potential delay for each message to be sent.
As a
consequence, for passive and semi-active replication, the position of
the primary server replica with respect to the client, together with
the
replica processing time, affect the end-to-end latency. To achieve the
best
latency, care must be taken to designate the most favorable position to
run the
server primary replica. However, for active replication, the replicas
enter a
competitive mode for sending messages and, thus, active replication is
more advantageous. Depending on the pattern of the remote invocation
and the
server processing time, assuming other factors are constant, the send
delay
introduced by Totem constitutes a large portion of the replication
overhead.
Fault Tolerance for Java
(1998)
Java Remote Method
Invocations (RMI) is a
mechanism used in distributed applications over the
Internet. Our goal is to provide fault tolerance for the Java RMI
applications transparently with no or minimum changes to the
applications. In the
Aroma system that we are developing, the objects of the applications
are replicated and managed by our
Replication Manager. Users can specify fault-tolerance properties such
as
active/passive replication, minimum number of replicas, etc for these
objects. The
outgoing messages from the application are intercepted by a custom RMI
socket
factory. These messages are then directed to the local Replication
Manager,
which is responsible for transmitting them to the Replication Managers
on which
the target replicas reside through a reliable totally ordered multicast
layer (Totem). The message passing between the replicated application
objects
and the Replication Managers is through local TCP. We have developed a
prototype that is intended to support simple client/server Java RMI
applications. It
provides warm passive replication and active replication with or
without
majority voting. For state transfer, the applications have choice of
either providing
their own custom serialization methods or using the serialization
mechanisms
provided by the JDK. An auction application has been built to test/demo
our
prototype.
Last updated July, 2009 by Wenbing Zhao. Send your
comments to:
wenbingz at acm dot org.