Architecture and topology

With WebSphere® eXtreme Scale, your architecture can use local in-memory data caching or distributed client-server data caching.

WebSphere eXtreme Scale requires minimal additional infrastructure to operate. The infrastructure consists of scripts to install, start, and stop a Java™ Platform, Enterprise Edition application on a server. Cached data is stored in the eXtreme Scale server, and clients remotely connect to the server.

Distributed caches offer increased performance, availability and scalability and can be configured using dynamic topologies, in which servers are automatically balanced. You can also add additional servers without restarting your existing eXtreme Scale servers. You can create either simple deployments or large, terabyte-sized deployments in which thousands of servers are needed.

Maps

A map is a container for key-value pairs, which allows an application to store a value indexed by a key. Maps support indexes that can be added to index attributes on the key or value. These indexes are automatically used by the query runtime to determine the most efficient way to run a query.

Figure 1. Map

A map set is a collection of maps with a common partitioning algorithm. The data within the maps are replicated based on the policy defined on the map set. A map set is only used for distributed topologies and is not needed for local topologies.

Figure 2. Map sets

A map set contains a collection of maps.

A map set can have a schema associated with it. A schema is the metadata that describes the relationships between each map when using homogeneous Object types or entities.

eXtreme Scale can store serializable Java objects in each of the maps using the ObjectMap API. A schema can be defined over the maps to identify the relationship between the objects in the maps where each map holds objects of a single type. Defining a schema for maps is required to query the contents of the map objects. eXtreme Scale can have multiple map schemas defined. For more information, see ObjectMap API.

eXtreme Scale can also store entities using the EntityManager API. Each entity is associated with a map. The schema for an entity map set is automatically discovered using either an entity descriptor XML file or annotated Java classes. Each entity has a set of key attributes and set of non-key attributes. An entity can also have relationships to other entities. eXtreme Scale supports one to one, one to many, many to one and many to many relationships. Each entity is physically mapped to a single map in the map set. Entities allow applications to easily have complex object graphs that span multiple Maps. A distributed topology can have multiple entity schemas. For more information, see EntityManager API.

Containers, partitions, and shards

The container is a service that stores application data for the grid. This data is generally broken into parts, which are called partitions, and hosted across multiple containers. Each container in turn hosts a subset of the complete data. A JVM might host one or more containers and each container can host multiple shards.

Remember: Plan out the heap size for the containers, which host all of your data. Configure the heap settings accordingly.

Figure 3. Container

A container exists within a Java virtual machine and hosts a number of shards.

Partitions host a subset of the data in the grid. eXtreme Scale automatically places multiple partitions in a single container and spreads the partitions out as more containers become available.

Important: Choose the number of partitions carefully before final deployment as the number of partitions cannot be changed dynamically. A hash mechanism is used to locate partitions in the network and there is no way for ObjectGrid to rehash the entire data set after it has been deployed. Overestimate the number of partitions.

Figure 4. Partition

Shards are instances of partitions and have one of two roles: primary or replica. The primary shard and its replicas make up the physical manifestation of the partition. Every partition has several shards that each host all of the data contained in that partition. One shard is the primary, and the others are replicas, which are redundant copies of the data in the primary shard. A primary shard is the only partition instance that allows transactions to write to the cache. A replica shard is a "mirrored" instance of the partition. It receives updates synchronously or asynchronously from the primary shard. The replica shard only allows transactions to read from the cache. Replicas are never hosted in the same container as the primary and are not normally hosted on the same machine as the primary.

Figure 5. Shard

To increase the availability of the data, or increase persistence guarantees, replicate the data. However, replication adds cost to the transaction and trades performance in return for availability. With eXtreme Scale, you can control the cost as both synchronous and asynchronous replication is supported, as well as hybrid replication models using both synchronous and asynchronous replication modes. A synchronous replica shard receives updates as part of the transaction of the primary shard to guarantee data consistency. A synchronous replica can double the response time because the transaction has to commit on both the primary and the synchronous replica before the transaction is complete. An asynchronous replica shard receives updates after the transaction commits to limit impact on performance, but introduces the possibility of data loss as the asynchronous replica can be several transactions behind the primary.

Figure 6. ObjectGrid

Clients

Clients connect to a catalog service, retrieve a description of the server topology, and communicate directly to each server as needed. When the server topology changes because new servers are added or existing servers have failed, the client is automatically routed to the appropriate server that is hosting the data. Clients must examine the keys of application data to determine which partition to route the request. Clients can read data from multiple partitions in a single transaction. However, clients can update only a single partition in a transaction. After the client updates some entries, the client transaction must use that partition for updates.

The possible deployment combinations are included in the following list:

A catalog service exists in its own grid of Java Virtual Machines. A single catalog service can be used to manage multiple instances of eXtreme Scale.
A container can be started in a JVM by itself or can be loaded into an arbitrary JVM with other containers for different ObjectGrid instances.
A client can exist in any JVM and communicate with one or more ObjectGrid instances. A client can also exist in the same JVM as a container.

Figure 7. Possible topologies

Catalog service

The catalog service hosts logic that should be idle during a steady state and has little influence on scalability. The catalog service is built to service hundreds of containers becoming available simultaneously and runs services to manage the containers.

Figure 8. Catalog service

The catalog service runs within a JVM and consists of the location service, placement service, core group manager, and administration.

The catalog responsibilities consist of the following services:

Location service: The location service provides locality for clients that are looking for containers hosting applications and for containers that are looking to register hosted applications with the placement service. The location service runs in all of the grid members to scale out this function.
Placement service: The placement service is the central nervous system for the grid and is responsible for allocating individual shards to their host container. The placement service runs as a one-of-N elected service in the cluster so there is always exactly one instance of the placement service running. If that instance should stop, another process takes over. All states of the catalog service is replicated across all servers hosting the catalog service for redundancy.
Core group manager: The core group manager manages peer grouping for health monitoring, organizes containers into small groups of servers, and automatically federates the groups of servers. When a container first contacts the catalog service, the container waits to be assigned to either a new or an existing group of several Java virtual machines (JVM). Each group of Java virtual machines monitors the availability of each of its members through heartbeating. One of the group members relays availability information to the catalog service to allow for reacting to failures by reallocation and route forwarding.
Administration: The four stages of administering your WebSphere eXtreme Scale environment are planning, deploying, managing, and monitoring. See the Administration Guide for more information on each stage.

For availability, configure a catalog service grid. A catalog service grid consists of multiple Java virtual machines, including a master JVM and a number of backup Java virtual machines.

Figure 9. Catalog service grid

A catalog service grid consists of multiple Java virtual machines, including a master JVM and a number of backup JVMs.