An eXtreme Scale distributed grid is divided into partitions. A partition holds an exclusive subset of the data. A partition is made up of one or more shards: a primary shard and replica shards. You do not need to have replica shards in a partition, but replica shards provide high availability. Whether your deployment is an independent in-memory data grid or an in-memory database processing space, data access in eXtreme Scale relies heavily on sharding concepts.
The data for a partition is stored at run time in a set of shards. This set of shards includes a primary shared and possibly one or more replica shards. A shard is the smallest unit that eXtreme Scale can add or remove from a Java™ virtual machine.
Two placement strategies exist: FIXED_PARTITIONS (default) and PER_CONTAINER. The following discussion focuses on the usage of the FIXED_PARTITIONS strategy.
If your environment included ten partitions that hold one million objects with no replicas, then ten shards would exist that each store 100,000 objects. If you add a replica to this scenario, then an extra shard exists in each partition. In this case, 20 shards exist - ten primary shards and ten replica shards. Again, each one of these shards store 100,000 objects. Each partition consists of a primary shard and one or more (N) replica shards. Determining the optimal shard count is critical. If you configure a small number of shards, data is not distributed evenly among the shards, resulting in out of memory errors and processor overloading issues. You must have at least ten shards for each JVM as you scale. When you are initially deploying the grid, you would potentially use a large number of partitions.
Scenario: small number of shards for each JVM
Data is added and removed from a JVM using shard units. Shards are never split into pieces. If 10 GB of data existed, and 20 shards exist to hold this data, then each shard holds 500 MB of data on average. If nine Java virtual machines host the grid, then on average each JVM has two shards. Because 20 is not evenly divisible by 9, a few Java virtual machines have three shards, in the following distribution:Scenario: increased number of shards per JVM
In this scenario, consider a much larger number of shards. In this scenario, there are 101 shards with 9 Java virtual machines hosting 10GB of data. In this case, each shard holds 99 MB of data. The Java virtual machines have the following distribution of shards:When you are creating your system, use 10 shards for each JVM in its maximally sized scenario, or when the system is running its maximum number of Java virtual machines in your planning horizon.