One of the biggest impediments to learning Aerospike as a new user is mastering the vocabulary. I started this glossary in my spare time as an aide to myself and refined it over the course of several years of working with the database. I had wonderful feedback from others I’ve worked with to insure that it was complete and up-to-date. I’m sharing this as an effort to help others learn and gain better understanding the foundations of Aerospike.
Term | Definition |
---|---|
acl | Aerospike Client Library - Client for accessing Aerospike DB. Handles connection negotiation, management as well as the reading and writing of data. It’s a “smart” client that maintains awareness of where data resides on the server and will route the connection to where the data should be using the partition ID calculated from the Primary Index Digest. |
all flash | When the PI and the data are stored on NVMe flash devices rather than storing the PI in memory. The devices for storing the PI and the data should be different. |
aql | Aerospike Quick Look - A client built around a familiar and common query language. May be familiar to SQL users but does not maintain parity with SQL by design. |
asadm | Aerospike admin tool. Multifunctional utility to extract and change configuration, configure auth, and analyze performance and health information out of a cluster or a collectinfo file. Python based. |
asmt | Aerospike Shared Memory Tool: enables primary and secondary indexes to be backed up from shared memory to files and restored from files to shared memory allowing the server to be rebooted and the indexes restored, enabling fast restart. |
available mode | Aerospike default “Available mode” or Available and Partition-tolerant (AP), any partitioned set of servers can claim complete ownership of a record. c.f. Strong Consistency mode (CP), CAP Theorem |
batch | It’s a transaction where you have the have the key/digest and know the records you want, and sent together directly to the node or nodes. A batch groups multiple operations into one unit and pass them over a single network socket to each cluster node. |
bin | A sub-object of a record. Has a data type which need not match other records. |
cap theorem | CAP theorem states that there are three possible properties of a distributed system, Consistency, Availability and Partition Tolerance, and that of these three properties, any given system can only provide two of the three possible properties. In practice only two paired properties are implemented as distributed databases. AP (Available and Partition Tolerant) and CP (Consistent and Partition Tolerant). CA (Consistent and Available) in practice cannot be implemented as a distributed system because consistency and availability fail in the case of a partition and so cannot deliver fault tolerance. |
cold start | After shutdown, the disks are scanned and the index is rebuilt from storage. Records that have not been durably deleted may resurrect, leading to that record reverting to an older record. c.f. fast restart |
defragmentation | As records get updated or deleted, the percentage of active records in a previously written block may fall below defrag-lwm-pct. Once this happens, the block becomes eligible for defragmentation, and the block is added to the defragmentation queue. This will start the process of reading records from partially empty write blocks and rewriting to a new write block to ensure efficient use of space and efficient record access. |
demarshaling | a.k.a. deserializing. Takes a serialized data structure (e.g. from incoming network communication) and converts it into an internal data structure. The reverse operation of converting an internal data structure to a serial format is marshaling or serializing. |
digest | Primary Index Digest - unique object identifier created client side by hashing the user key and, if available, the record’s set name using the RIPEMD-160 algorithm. Digests are always 20 bytes (due to RIPEMD-160) and can save storage for long keys. |
fast restart | a.k.a. warm start. Aerospike Enterprise Edition will store the Primary Index and other critical metadata on Linux shared memory. When restarting asd on a system, this metadata will allow a restarted asd process to quickly rebuild its index without having to fully regenerate its state from the storage drive. Can use ASMT to save metadata to disk for fast restart across reboot. |
heartbeat | Messages sent between nodes on an Aerospike cluster, used by the nodes to track each other and detect changes in the cluster. Heartbeat protocol may be multicast (using IP multicast) or mesh (uses configured IP address of other peer node to connect) |
hotkey | a.k.a. hot digest. A key/digest that is accessed disproportionally more than other keys in the same dataset. This can cause unbalanced load on the cluster, i.e. one node handling more load than its peers. Write hotkeys are logged but read hotkeys are only logged in strong consistency mode, though you can turn on key logging by setting rw-client to detail and review for hot keys that way. |
key | Unique ID of a record. The key is not stored with the record by default. |
lut | Last update time of namespace or set. Specified in nanoseconds since update. |
master record | The primary copy of a record. In a namespace with a replication factor of 2, there is one master record and one replica. This copy is where any writes occur, so it can sometimes be referred to as the write-master and the replica is known as the write-prole. |
migration | Migrations occur when the cluster topology changes such as when a node is added or removed from a cluster or due to network issues. During migration record data moves as part of the partition it’s mapped to via its key hash. |
namespace | A top level data container. A physical collection of similar records in a storage engine with common policies i.e. replication factor, encryption, storage, etc. Similar to a table-space in an RDBMS. |
node | A node is an individual server within an Aerospike cluster that holds some of the data and provides some of the computing power. An Aerospike cluster is generally made of many nodes |
nsup | namespace supervisor - i.e. the main server thread that handles expirations and evictions. |
primary index | Primary Key Index or PI - This is a set of 20B RIPEMD-160 hashes that are created from the set and identifier portion of the record key tuple (namespace, set, identifier). The first 12 bytes of this hash determines which of the 4096 partitions within its namespace that the record will be assigned to. The hash is stored in a hash table that will look up a specific red-black tree data structure called a sprig. The sprig contains the data location metadata. When looking up a record from a client using its primary key, a digest will be created from the set and identifier. This will allow the client to find the partition and node from the partition map and directly address that node for the data. Once the request is received, the digest is used to find the record entry in the hash table and retrieve the data location metadata from the sprigs and access the full record. |
pristine blocks | Blocks that have never been written to by Aerospike. Aerospike will prioritize writing to blocks that have been written to and cleared by the defragmentation process before writing to pristine blocks. This improves the speed of a cold start as blocks are written sequentially and therefore by tracking where unwritten blocks start cold start indexing can skip blocks after that point and proceed to start. |
query | A request for all records matching criteria. Can query against a Primary index (key) or a Secondary index (an index cross-referencing the key against another attribute that can be searched)In a primary index or PI query, you can perform read only queries like getting all or some of the records in a named namespace or set or all the records before a given LUT. You can also perform a background read-write query which will perform a PI query and perform a UDF defined action.In a secondary index or SI query The number of matching records and their location may be unknown at the time of the query. |
record | It’s an object containing data identified by a single key. Similar to a row in an RDBMS. Every record is stored in a partition and optionally may be stored in a set. |
record block | Initial landing spot for an incoming written record. A record block can only hold one record although a record can span multiple record blocks. 128 bytes as of v3.2. |
replication factor | a.k.a. RF. The number of copies of each record that are held in a namespace. |
rw-hash | Replica Write hash. Structure used to park transactions that need to consult another node prior to returning to the clientTransactions that would use rw-hash:- write transactions- read transactions if duplicate resolution is required (i.e. when migrations are going on)- strong consistency enabled namespaces will park read transactions in rw-hash |
sc mode | Strong consistency mode. Guarantees that all writes will be applied sequentially and not reordered or skipped. Consistent and Partition-tolerant (CP) from a CAP theorem perspective vs. the Aerospike default “Available mode” or Available and Partition-tolerant (AP). |
service thread | Worker thread on a cluster node for threads receiving client requests and executing transactions |
set | Optional method of logically grouping records within a namespace via a record attribute. Functions similar to a table in an RDBMS but w/o a schema. |
set index | Logical subset of records that is part of a larger namespace. Using a set index can reduce the amount of full primary index scans required to find a record. Rule of thumb is sets larger than 1% of the namespace tend to find less benefit from set indexes but testing is encouraged. |
sindex | Secondary Index or SI - Locates all the records in a namespace or set by a bin value within the record. On Aerospike a sindex is built per node by the node itself and only contains references local to the node. The sindex can contain both master records and replicated records. Can be created and maintained using asadm |
sprig | A memory based binary tree data structure used by Aerospike to store and retrieve primary index data. |
storage engine | The physical storage medium and also the method data is written to the medium |
subset | Flag used during migrations that indicates that a partition is not full and is removed when the partition is full and migration to that partition is completed. e.g. during an add node operation the partition being replaced will be marked as subset immediately and the partition replacing it will also be marked as subset until migration of data completes. |
system metadata | a.k.a. SMD. Used to store important system information such as secondary indexes, user defined function definitions, user permissions, eviction information and other data. Usually found on the node at /opt/aerospike/smd |
tend/tending | Process used by the client to discover the cluster address and which partitions are owned by a node, and uses this to map all partitions to cluster nodes. Tending starts with the initial seed connection from the client to the seed node. The client requests a list of cluster node addresses and their partitions along with the generations. Once the initial update is complete, the node is checked every second for partition updates (to detect moving partitions). It also tracks each socket’s last used time (maxSocketIdle) and if serverClock - lastUsedTime is greater than ClientPolicy.maxSocketIdle it closes the socket on the side |
transaction id / TR ID | ID returned to client by node for query. In post 6.0 this is returned to client immediately. |
tsvc | Transaction Service - Largely depreciated and folded into Service threads, which now handle transactions. A transaction can be either read or write at this stage, but the namespace will be known. A tsvc-timeout indicates that a service thread did not handle the transaction before it expired, and probably indicates that there are not enough service threads to handle the incoming requests. |
udf | User Defined Function. Code written by a user to run inside the Aerospike Database Server. Currently, UDF’s can only be written in Lua. |
write block | Where record blocks are written. Also known as a streaming write buffer, swb, or wblock. A record cannot span more than one write block, and the write-block-size is the record size limit. 1MiB by default when write-block-size omitted. The write block is flushed when it is full or if it has not been flushed in flush-max-ms (default 1s). The write bloc is queued to write on the write queue. Aerospike only writes out fresh blocks. Records are never updated in place. |
write queue | Initially, this is where write blocks are cached in RAM until they are written down to the storage engine. |
xdr | Cross datacenter replication. Replicates records asynchronously across high latency network links. Can replicate full namespaces, a set or sets within a namespace and a bin or bins within a record |