Storage Glossary

This text, mainly written in the form of a technical terms glossary, is meant as quick-start guide on computer data storage. Short paragraphs per term summarize, often complex, matter to provide you with an overview of the field, a first primer on this fascinating topic.

19-inch rack

A standardized frame or enclosure of 19" width for mounting multiple electronic equipment modules stacked over another on two parallel vertical rails. The front plate of one module measures 19 inches (482.60mm) wide, thus the name. The width of the front plate is 19" (482.60mm) in total, including the screw terminal area, with 17.75" (450.85mm) clear area in-between vertical screwing rails (called "rack opening"). One height unit (a "rack unit", measured in "U", 1U, 2U, 3U, etc.) is 1.752" (44.50mm) high. 19" racks may be embodied as open frames, offering essentially vertical screwing rails only ("two-post racks", popular in telco applications) or as open or closed cabinets (usually 24" wide, 609.6mm), with optional additional internal width outside the rail area for more elaborate cabling. 19" rack modules may be short enclosures or very deep ("mounting depth"). Larger mounted enclosures, meaning with often added height and usually very deep, are commonly screwed to the front and rear posts, to properly support the heavy weight. When ball-bearing "Rack-rails" are used as underpinning in this four-post bolting, the whole enclosure can be pulled out like a drawer for easier access.

In some space constrained applications, it has become popular to use the half-width rack format, a 9.5" or 10-inch rack. In 19" racks, having two devices share an available height-unit's horizontal space is popular in audio applications and is gaining attention for computer installations in recent years. 10" racks usually are less deep, in comparison to common 19" server cabinets, following the overall smaller footprint. 10-inch rack dimensions are: exactly 10" (254.00mm) from side to side horizontally (front plate), including screwing area. Actual usable width is 8.75" (222.25mm) clear area in-between vertical screwing rails ("rack opening"). Both standards, 10" and 19" use rails of same width, 0.625" (15.875mm). Apart from saving space in commercial installations, 10-inch rack frames are deployed in military, in airborne or otherwise weight or space constrained scenarios. While dedicated 10-inch switches, 10-inch hubs and 10" computer enclosures are available, it is common in SoHo or business environments to install, not specifically 10", consumer routers or similarly small devices in 10-inch racks in a DIY fashion on 10-inch rack shelfs. The Help section has an indepth document on rack mount equipment.

3-2-1 Storage Strategy

is a best-practices suggestion for reliable storage of data, of files, objects or backups. Users are asked to keep three copies of digital assets and store them on two different types of media. One of these two copies should then be physically stored in a different location. Storage media may fail, and the principle of using different storage media is based on the observation that different types of media exhibit different failure patterns. This way, the loss of two copies within a small timeframe is less likely. Finally, physically separating these copies optimizes data safety by preventing an incident like water, fire, or theft from impacting data availability. Cloud storage may be used as an offsite location. Updated schemes of the 3-2-1 rule add more components to the proven formula. The "3-2-1-1-0" rule adds the requirement for one truly offline copy of data to protect against ransomware attacks (Air Gap). The trailing zero represents "zero errors", meaning that the copied data has to be audited regularly to ensure it is still readable and free of spontaneous modifications (data corruption, "bit rot"). Cheksumming may be used to test data integrity. The 3-2-1 rules and regular data integrity checks are part of a holistic DP&R strategy.

3.5-Inch Floppy Disk

the third and last step in floppy disk formats, the 3.5 floppy disk marked pinnacle in widespread portable removable magnetic storage. Enjoying widest popularity throughout the 1990s, the smaller flexible disk inside a self-contained ridig plastic case was the go-to mobile storage medium of choice for many 16 bit home computer systems, business PCs and musical instruments. The 3.5" disk became the synonymon for "saving to disk" and is the common user-interface icon for saving data to non-volative storage to this day with.

4K video

(as a challenge, in a Video storage context) "4K" video refers to a video that possesses four times greater resolution than full High-Definition 1080 video. A higher resolution means that the video image is potentially sharper than at lower resolutions. In practice, there are two standard resolutions referred to as "4K video": 4096x2160 pixels for the film and video production industry and 3840x2160 pixels for television and monitors. While data consumption of video depends highly on used codec and bit-rate, one minute of 4K video can use up than a gigabyte of space. By estimate, an hour of 4K video takes approximately 45 GB of storage space. Handling 4K video requires underlying storage to provide sufficient IO, sustained read/high write speeds north of about 32MB per second / 200mbps.

5.25-Inch Floppy Disk

being an intermediary step in floppy disk formats, the 5.25 floppy disk defined a whole era of computing, in business and in the home, as the widely used storage medium for home computers. Popular between 1980 and 1990, it was the go-to mobile storage medium of choice for the Commodore C64, IBM AT and XT computers, IBM compatibles and many more. The floppy disk is synonymous for "storing" and its shape is the user-interface icon for saving data to non-volative storage to this day. With Micropolis being an early manufacturer of 5.25-inch floppy disk drives, our knowledge base has an extensive article on 5.25" floppy disk variants.

AES-256-CCM

AES is short for the symmetric encryption algorithm "Advanced Encryption Standard", also known as Rijndael, named after the Dutch inventor of this family of block ciphers. AES is one variant of a Rijndael cipher, "256" describing the used key length and 256 being a value selected during the standardization of ciphers by the US National Institute of Standards and Technology (NIST). AES-256 is used in encryption, authentication and data storage. The added "CCM" is the abbreviation of "Counter with Cipher block Chaining Mode - Message Authentication Code", or "Counter with CBC-MAC", or "CCM-mode". This operation mode adds integrity checking through CBC-MAC and is used by applications where a high-security level with strong encryption and authentication is needed.

AHCI

Short for "Advanced Host Controller Interface". In comparison with NVMe, which is commonly used for solid-state storage devices, AHCI is more tuned towards use with spinning magnetic media, which is usually slower and has its own complications. As such, in solid-state contexts, AHCI has a number of performance drawbacks NVMe removes.

Allocate-on-flush

also called "delayed allocation" is an optimization scheme found in some file-systems (e.g. XFS, ZFS, Reiser4). Data scheduled for write to storage isn't actually written immediately but held in memory until memory runs out or the 'sync' system call is issued. This helps reduce processing overhead and helps in mitigation of fragmentation inherent in block mapping data storage, as multiple smaller writes can be combined into faster and more aligned long sequential writes. This is especially useful in combination with a copy on write (CoW) scheme.

ACID properties

is a common set of guarantees in information technology, usually found in database systems (DBMS) or distributed systems. ACID is short for "atomicity, consistency, isolation, durability", describing data validity expectations in the face of physical hardware failures, errors in software or a combination thereof.

Adler-32

Adler-32 is Mark Adler's simple and efficient algorithm intended for checksum value calculation. It is used for error detection in data transmission or storage, where speed is more important than reliability.

Air Gapping

From "air gap", also "air wall", is a security measure used in computer networking. The term means isolating a computer, group of computers, or an entire computer network from other, potentially less secured networks ("disconnected network"). In practice, this security measure can be achieved in two ways - by total physical isolation, where all physical and wireless connections are severed, or by logical isolation, using strict firewall rules, encryption, or network segmentation. In data backup, air gaps are also a crucial element, as physically removing - for example DLT tapes from a drive - isolates the written data from any access. While this also prevents the data from being read, such a measure is the most drastic protection against any remote tampering. In backup stores a "virtual air gap" feature represents a middleground, where either software or hardware measures allow objects or written data sets to be set to be "immutable" for a given period ("object lock").

Active Directory (short "AD")

Microsoft's Active Directory (AD) is a database and set of processes and services used by Windows Server operating systems for authorizing and authenticating all users and computers in a centralized Windows domain computer network. It was designed for network resources management and security enhancement. Active Directory stores all user accounts and groups, security policies, trust relationships, and domain controllers - used by an Enterprise. AD is orchestrated by a server running the "AD DS" (Active Directory Domain Service) role. In summary, AD defines who can access which object, or who can modify an object.

AFP (Apple Filing Protocol)

The Apple Filing Protocol (AFP), originally "AppleTalk Filing protocol" is a discontinued network file sharing and control protocol developed by Apple and was originally designed for sharing files between Macintosh computer workstations and servers. It was commonly used for large file transferring, for example in video storage or editing environments. Files were served under a global namespace in URI form, prefixed with "afp://". It allowed servers and users to use various methods of user authentication, controlling access and prompting for a password input when a user tried to access a resource or volume for the first time. AFP was one part of the more general "Apple File Service" (AFS). AFP was officially discontinued by Apple with macOS Big Sur in 2020 in favor of SMB.

Andrew File System (AFS)

developed at Carnegie Mellon University, the "Andrew File System" (AFS) is a distributed file system which presents resources stored on a number of servers as one homogeneous (unified, global) file-system to a user. When a user opens a file, the file is cached locally. A callback-mechanism on the backend following a weak consistency model makes sure that remote modifications are mirrored back to the local copy. AFS influenced a number of distributed filesystems, like Coda, InterMezzo, SUN's Network File System (NFS) or MapR FS ("Maprfs") of the "MapR converged data platform". OpenAFS, Arla or the Linux AFS implementation are forks of the original code. The name stems from "Andrew Carnegie".

APFS

APFS, short for "Apple File System," is a copy-on-write file system developed by Apple Inc. Although the original intention of APFS was to offer a file-system optimized for SSD storage media when it was introduced in 2016, it was later positioned as Apple's new general-purpose file-system. APFS is set to replace Apple's older default file system, the journaling HFS Plus. In 2017, Apple offered APFS as the default file system in iOS 10.3 (macOS High Sierra).

ARIES

"ARIES" is short for "Algorithms for Recovery and Isolation Exploiting Semantics" is a logging and recovery algorithm family primarily used in Database Management Systems. One element of ARIES is "write-ahead logging" (WAL), meaning actions are first recorded to a stable log file and then executed.

"Aries" was also the model name of the Aries series of Micropolis hard-disk drives, featuring a SCSI2 interface and coming in a 3.5 inch slim-line form-factor.

Auto-Tiering

Storage can be described as being organized in layers, each layer representing a class (or tier) of storage, each class inheriting its own cost and performance characteristic. Often data is classified and allocated into these layers by how often it is accessed, how fast it has to be accessed or how valuable certain data is. The scheme by which data is assigned to specific layers is called tiering. Tiering may happen manually or automated. "Automated Storage Tiering" (AST) or "Auto-Tiering" denotes systems that are able to automatically allocate data to specific tiers, based on pre-defined rules, clever algorithms or artificial intelligence. One system offering this feature-set is IBM's "Hierarchical storage management" (HSM). Having data on different storage tiers is simply "Tiered Storage". Oftentimes, with cached data ("Data Cache"), data is moved between different caching tiers, e.g. from in-memory, to solid-state devices, and then to magnetic disk or magnetic tape (like LTO tapes) for long-term storage.

ATA over Ethernet

(mostly implemented by Coraid/SouthSuite, Inc.) ATA over Ethernet (AoE) is a network protocol designed to use the OSI model's data link layer for accessing block storage devices over Ethernet networks. It is used for transferring data without the use of the higher-level Transfer Control Protocol Internet Protocol (TCP/IP), lowering protocol overhead. Therefore, it is considered leaner and more suited for use in high-performance access to Storage Area Networks (SAN). Similarly as in iSCSI, the host system providing the actual physical data volumes is named the "AoE target" and the system that is accessing the volumes and mounting them is called "AoE initiator".

Authentication

Essentially authentication is proving the identity of something or someone. For example when someone logs on to a computer, the authentication scheme is expected to faithfully identify the person asking to be logged in actually is the person who owns a certain user account. Traditionally, this is done by the user providing a password, as key, along with the login/account name. As a password alone has been proven to be of limited security, more elaborate authentication schemes like Two-Factor Authentication (2FA) have been invented to ask for more out-of-band means of identification of a user.

Bandwidth

(or "throughput", in storage) Bandwidth represents the rate at which data can be transferred between computer workstations and servers or between storage devices in a given time. It is used as a metric for determining the performance of storage systems, and expressed in bytes or bits per second. Note the difference in definition to the term "bandwidth" in signal processing, where it means the range between lowest and highest attainable frequency in a defined signal power envelope.

BeeGFS

(short for "Bee Global File System", formerly known as "Fraunhofer Gesellschaft File System", "FhGFS") Parallel global file system (one single namespace), developed and optimized for high-performance computing (HPC). Wikipedia, Official Website.

Bin Locking (in Video Editing Systems)

Bin Locking represents the concept or functionality designed for collaborative video media editing and is known from popular professional video edit suite Avid Media Composer. It prevents simultaneous video media editing from multiple users within the same video project, enabling multiple users to work in the same project without data corruption. Usually, only the first user is granted write permissions, with all subsequent users giving read access only. As such, Bin locking is an essential feature for professional video editing environments where a great number of team members work on the same video project.

Bit rot

It represents the phenomenon that describes the process of gradual degradation or corruption of data in digital storage. As such, it is a colloquial term for "data degradation", decay or data rot in general and is possible on digital media, solid-state or rotating magnetic and optical media, but also on analog paper media like punching cards. Bit rot is not caused by physical damage or critical device failures but rather by accumulating non-critical failures over time in a data storage device or media. Bit rot mitigation requires schemes to detect changed data through integrity checking and possibly offering means of recovery through redundancy, error correction codes or self-repairing algorithms which move degraded data segments away from the source of degradation and to error free alternative storage areas or media.

Block cipher

Deterministic algorithms used in cryptography, usually consisting of actually two algorithms, one for encryption and one for decryption. The cipher produces a defined output from a defined input plus key. As the decryption process is defined as being the inverse of the encryption, the deciphering step is expected to produce the original input.

Block Level

Block Level is the level of a storage system on which a file-system would usually be formatted on. It is an abstraction between physical storage and file-system. Being able to access unformatted space on storage media means having "access to the block level". This is the common understanding of "block level", meaning "block-level access" to a storage device. From the perspective of a traditional file system, the "block level" describes the most low-level handling form of data a file-system uses to store data on media (a "block", sometimes "physical record"; this scheme is called "block mapping"). When a file-system commits data to storage, it writes one or more of such blocks on a HDD (Hard Disk Drive) or SSD (Solid State Disk), depending on the predefined block-size and the amount of data to be written. One file may be spread over multiple blocks and the file-system keeps track of these blocks. The block concept provides efficient data access and improves performance because data operations are performed on entire blocks rather than individual bits or bytes. This approach to store data is one possible abstraction of the underlying hardware, and as available storage space is divided into fixed size blocks, produces inefficiency when blocks aren't fully used and/or fragmentation of space. There are various approaches to mitigate these inefficiencies, usually by lowering the waste of space inherent to the fixed sized block concept. Some filesystems employ "block suballocation" and/or "tail merging" (Btrfs). Others offer the ability to dynamically change the underlying block size (ZFS). The concept of using Extents instead of individually addressed blocks is another update of the block mapping scheme. Although the block level abstraction of data storage is usually handled by the file-system, some applications employ their own block mapping layer (their own file-system format) to optimize block I/O performance, most notably Database Management Systems (DBMS). DBMS access storage via "block I/O", and this "block storage" may be directly attached via protocols like SCSI or Fibre Channel or remotely attached via protocols acting over fabric / SAN (Storage Area Networks), like the TCP/IP based iSCSI (Internet Small Computer Systems Interface) or AoE (ATA over Ethernet) protocol. The other context where it is common to speak of "block storage" is in enterprise cloud storage. These use cases usually require whole system images to be handled, moved and stored. The reason being that supervising a multitude of systems, represented by their attached storage, abstracts away the fact that each system itself holds thousands of files. The term "Block Storage" in this regard is used to describe a (virtual) unit that resembles a hard-drive/ a file-system/ a volume, and is treated as a single "block" or "file" that can be stored, retrieved, mounted and unmounted as (virtual) local storage. As such, cloud block storage, colloquially used, doesn't necessarily mean that such "block storage" offers low-level "block I/O" access, as for example DBMS would use.

Block Storage

Most storage devices operate by organizing available storage into defined sized chunks, so called "blocks". Thus, storage devices are called "block devices", "block storage devices", or simply "block storage". In enterprise IT and environments where physical systems are more and more virtualized, "Block Storage" has become a more general term, an umbrella term for cloud concepts, typically describing means to create and manage virtual volumes that emulate the behavior of a traditional block device. And "Block Storage" describes a perspective on storage volumes, workflows to handle such "blocks" of data, how infrastructure to facilitate large numbers of such "blocks" is setup on the backend, etc. In sum, "Block Storage" is an advanced and structured type of data storage architecture designed for enterprise environments where high performance, reliability, scalability and flexibility play a crucial role. It separates a host system and its traditional local storage and replaces local storage with virtualized storage "devices" operating like block storage devices (virtual "local block storage"). These "blocks" (meaning virtual block storage devices), treated as raw volumes, are comparable to physical hard drives, are commonly organized via unique identifiers (unique address) and can be moved between systems, can be cloned, archived, mounted and unmounted. In this way, a large amount of data can be efficiently processed. Block Storage is popular in Cloud Storage and Virtualization environments and is commonly used for storing data in Storage Area Networks (SANs). There are different ways of accessing storage blocks in a SAN, by iSCSI, Fibre Channel or FCoE protocols, each varying in complexity and/or protocol overhead. SANs are usually Distributed Block Storage systems where data blocks are spread over multiple physical servers and sometimes locations.

BMC

abbreviation for "Baseboard Management Controller" (sometimes "Board Management Controller") is a small dedicated computer-system (a microcontroller) embedded into the mainboard (baseboard) of systems that are usually operated remotely, like racked server systems in datacenters or NAS systems in off-site locations. A BMC is the central hub for sensor data, system state, power cycling and other means to supervise and control the host system. A BMC can be part of the IPMI control stack and may communicate over IPMI protocols. Often BMC networking is separate from host system networking, adding an additional layer of security by using a dedicated communication network or serial connection. The software (client) counterpart of the BMC is the "BMC Management Utility" (BMU), usually a command-line tool to communicate with a remote BMC. Some BMCs offer a web interface.

Btrfs

Btrfs, abbreviated from B-tree file system, is an advanced file system designed by Chris Mason and used on various Linux operating system distributions. It provides some next-generation features like - Copy-on-Write (COW) snapshots (read-only file system copies from a specific time point), built-in volume management with support for software-based RAID (Redundant Array of Independent Disks), fault tolerance, self-repair with automatic detection of silent data corruptions, checksums for data and metadata, etc. Wikipedia, Official Website (documentation).

Checksumming

Implementation of a scheme to calculate a small "checksum" from a block of data in order to detect errors or bit rot. Checksumming algorithms, present in file-systems and many areas of data storage, are selected based on their speed and reliability. Checksummed data may be the actual data payload of a file, but also a file's metadata.

Checksum algorithm

is a sequence of instructions used to generate a small block of data, called a "checksum", from a sequence of bytes, known as "bitstream". Checksum algorithms and their generated checksums are used to determine whether bitstreams in data I/O have changed. Some of the most popular checksum algorithms are Cyclic Redundancy Check (CRC), Adler-32, Message Digest Algorithm 5 (MD5), Secure Hash Algorithms (SHA), xxHash, Tiger, etc.

Ceph (distributed parallel file system)

Ceph is an open-source parallel distributed file and storage system designed to manage and store vast amounts of data. Ceph is known for its scalable architecture, good performance, and reliability. It offers object, block, and file storage in one unified system with no single point of failure. Ceph's main component - RADOS (Reliable Autonomic Distributed Object Store), provides an Object Gateway (RADOS Gateway, RGW), RADOS Block Device (librbd, RBD), and Ceph File System (libcephfs, CephFS). Ceph used to work on a traditional FileStore layer living on a block device, but migrated to its own storage implementation called "BlueStore". With "BlueStore", Ceph builds directly on the block device level, allowing for optimized i/o and improvements for SSDs. BlueStore also serves as the foundation for BlueFS, a lightweight file-system-like layer for Ceph's key/value-stores. These internal mechanisms are opaque to an end-user as Ceph typically presents data in a traditional POSIX filesystem to attached client systems.

CIFS

Abbreviated from Common Internet File System, is an internet file-sharing protocol developed by Microsoft for providing access to files and printers between nodes on a network. It is a newer, enhanced dialect (a particular implementation/version) of the Server Message Block (SMB) protocol and was introduced with Windows 2000.

Cloud Storage

Cloud Storage is a data storage service model where the data is transmitted, stored, and managed on remote storage systems maintained by a third-party service provider. This virtualized storage infrastructure provides accessible interfaces to users over the internet so they can access their data from any device with an internet connection. Cloud storage eliminates the need to purchase, manage, and maintain in-house storage infrastructure. As a broad term, cloud storage may describe end-user use cases or more professional enterprise storage solutions.

Cloning

the process of cloning in data storage usually describes a method of creating an exact replica of an original dataset, a database, or a whole storage volume - based on the snapshot created at some point in time. Cloning is commonly used for data backup and recovery, migrating operating systems, digital forensics, or mass deployment purposes. This process also implies replicating boot records, settings, metadata, and file systems found on the particular original storage.

Cluster

while Cluster is a broad term in computer science describing a number of interconnected systems, a cluster in data storage represents a group of storage devices connected together to form a unified storage in order to provide scalability, enhanced data availability and distribution, resilience, and higher performance. Storage clusters are commonly used in distributed file systems and cloud storage where the amount of data is constantly growing and clients handle large datasets.

Clustered file system

A clustered file system is a type of file system that provides simultaneous data distribution and retrieval across multiple physical storage servers, while appearing as a single logical file system for access and management. There are two distinctive architectures of clustered file systems: 1) "Shared-disk file systems", examples include IBM's General Parallel File System (GPFS), Red Hat GFS2, BSD HAMMER2, Quantum's StorNext, VMware's VMFS, Sun/Oracle's QFS - and 2) "Distributed file systems", examples are Hadoop (HDFS), GlusterFS, CephFS, Lustre, BeeGFS, MooseFS / LizardFS, XtreemFS. In distributed file systems (also called "shared-nothing" file systems) - each server in the cluster has its own local storage, while in shared disk file systems - all servers in the cluster have direct access to a common set of shared storage devices. While clustered filesystems share many layout similarities with Parallel Filesystem, the basic principles differ. Parallel filesystems on the one hand distribute and shard data to increase possible I/O concurrency and optimize performance. The term "clustered filesystem" on the other hand usually describes a distributed filesystem that appears as one unified (global) filesystem ("hierarchical distributed file system"), with file-locking and data consistency schemes to allow multiple users to access this unified data simultaneously. Moving the design layout of "traditional" hierarchical distributed file systems into the cloud tends to be a challenge. While S3 / object based stores align better with cloud backend infrastructure, layering POSIX semantics on top of such object stores is an ongoing struggle.

Cold Storage

Related to the distinction of "hot" and "cold" spares in RAID systems, and from common knowledge about the behavior of ice and frozen materials, the term "cold" usually describes data that is either seldom accessed or difficult/slow to access (meaning it has to be "thawed", "unfrozen" prior to being available). Cold Storage (sometimes "Coldline storage") can mean that physical drives or tapes are physically detached from a computer and stored away ("Offline Storage"). In larger storage or data center contexts, cold storage can mean a scheme where hard drives are spun down in place (put into controller or software initiated stand-by, or are physically disconnected from power supply) to lessen the effects of mechanical wear and/or save energy. Starting up (or reconnecting) a cold storage resource is usually much more time consuming than accessing a storage system characterized as being "hot". While cold data is often associated with data on less performant media (thus less expensive), cold data can just as well describe a (more expensive) backup solution or data vault that is more suitable, more secure and/or reliable for long-term storage, as is an important metric in archival work. Digital data storage in DNA ("DNA Storage", "DNA digital data storage") can be one (future) form of cold data (as of 2024), where binary information is encoded in synthetic DNA (synthetic Deoxyribonucleic acid, using nucleotides A G T C in sequences). There is a gradient between hot, nearline and cold data storage patterns, with varying degrees of what is regarded as nearline or cold storage data. Nearline and cold data can be stored in MAIDs (massive array of idle drives), optical jukeboxes or tape libraries.

Coercivity

(also magnetic coercivity, coercive field or coercive force) is a measure of the ability of a ferromagnetic material to "remain magnetized". It is usually used in conjunction with magnetic media and describes the force a read/write-head has to exert to magnetize an area on the surface or a magnetic recording medium.

Content-addressable Storage

Content-addressable storage (CAS), sometimes "content-addressed storage" or "fixed-content storage" is a file storage paradigm, where content isn't stored by a given name under a given path but in reverse by its contents, usually by hashing the file contents with a defined cryptographic hash function and using the resulting string as the file's identifier. This approach is common in systems that strive to create a single global representation for a large corpus of files, like in public P2P (peer to peer) file (sharing) systems.

Copy on Write

CoW is a resource management technique used in many areas of computer science. In relation to storage CoW usually refers to a scheme for light incremental variation-copies of original data. When a user accesses data, this data is a set of original data blocks. Once read data is modified and commuted back to storage, a system using CoW only writes the affected data blocks instead of unnecessarily duplicating the entire dataset or re-writing all original blocks. File systems like Btrfs and ZFS are know for offering this feature. As data is forked instead of replaced, CoW is a mild form of snapshotting or incremental backup. It depends on implementation if older revisions of a modified resource are exposed to the user. "Copy on Write" may also refer to a a data safety scheme where data is never overwritten but is always fully sent to storage first (generating new data blocks), and only after the storage backend has confirmed that data has been successfully committed to storage, the original file/allocated blocks of the original file are freed up.

Converged Ethernet

Converged Ethernet (as part of the Data center bridging (DCB) initiative) is an improvement to the Ethernet protocol to improve its fitness for data center and enterprise use. Many Storage Area Network applications and the used protocols use UDP instead of a full TCP stack to improve performance. But Ethernet at its core is a "best effort" network, and UDP messages aren't guaranteed to arrive. Thus many SAN protocol implementations (e.g. RoCE in select versions) use some sort of mechanism to guarantee the arrival of messages as part of their protocol layer. Another approach now is to amend the underlying Ethernet layer in such a way that UDP messages are guaranteed to arrive. "Convergence Enhanced Ethernet" (or "Converged Enhanced Ethernet" (CEE)) is such an extension. But as these extensions are still in flux, it requires hardware equipment to be compatible.

CSP (Cloud Service Provider)

is a type of IT business that offers and sells (mostly) cloud computing solutions to customers to enable or supplement IT services or solutions on their end. Solutions offered range from compute (CPU/GPU) over storage and more elaborate managed computing offerings. Cloud computing usually describes more low-level building-blocks of enterprise IT or branch out into the domain of Infrastructure-as-a-Service (IaaS) or Platform-as-a-Service (PaaS), in contrast to higher-level solutions being SaaS (Software as a Service), offerings from Application Service Providers (ASP). The category of general CSP is sometimes broadly described as offering a X-as-a-Service (XaaS) model, where X may refer to any of the named solutions.

Cyclic Redundancy Check (CRC)

CRC is an "accidental error or change"-detection method (a checksum algorithm) employed in data transmission and storage to ensure data integrity. In data storage, a CRC value is a defined-size checksum calculated based on input data such as a file's payload data and is stored together with it on the storage device. Upon reading that specific data, the CRC value is recalculated and compared with the originally stored CRC value. If those values match, it indicates that the data has likely not been corrupted.

Cylinder-Head-Sector

an earlier (obsolete) method of addressing physical blocks of data on a hard disk platter surface (CHS addressing).

Data Corruption

Data Corruption is the process of unintended errors being introduced in original data during its transmission, reading, writing, storage, or processing - resulting in the data becoming unreadable, unusable, unreliable, or inaccessible. Many factors cause data corruption, for example - data transmission failures, hardware failures, human errors, malware, or software bugs. Accordingly, various mitigating measures can be deployed for prevention, such as - using Error-Correcting Code (ECC) mechanisms, antivirus software, redundant power supplies, regular backups, etc. One form of silent data corruption (data degradation) is bit rot.

Data Lake

A Data Lake describes a central repository to store large amounts of unstructured data, data in its native (raw) format, using a flat architecture. Flat architecture hereby means data is not stored in a hierarchical structures or that any means were used to filter, pre-process or structure incoming data. All data is stored as is, lowering costs and handling overhead. This is the opposite of a data warehouse, where data is pre-processed and structured before being stored. While a data warehouse processes contents for simple access and analysis upon ingest, a data lake is expected to be processed and structured only when it is actually accessed. Data Warehouses are usually built in batches, with smaller amounts of structured data, while Data Lakes are expected to cope with batch imports and live streaming incoming data, like from IoT devices or the Web.

Data Lakehouse

Combination of two principles in data management, "Data Lake" (flexible and cost efficient) and "Data Warehouse" (defined structures for organization and analysis of data), in a database-like structure. The Data Lakehouse approach tries to achieve the best of both worlds.

Data Offloading

a scheme or tactic to optimize signal flow in or through computer systems. Depending on the data or burden being offloaded, sometimes also known as "Traffic Offloading" or "Computation Offloading". In data storage, data offloading can be done on the network interface, where field programmable or custom designed Co-Processor ICs take over some or all of the data moving workload, lowering impact from these tasks on the host CPU. Such specialised DPUs ("Data Processing Units") play an increasingly important role in modern hyperconverged and software-defined data-centers. A DPU allows resources on a server host, like GPU or Storage to be closely coupled to network transport, effectively connecting these endpoints directly through the NIC. This way, it is possible to design data-center-scale systems where GPU nodes and storage nodes can work in tandem over fibre with only minimal impact on performance in terms of latency and throughput. Implementations might be in software, in hardware or in a combination thereof. The open-source "DPDK" (Data Plane Development Kit) is an open-source software project that provides a set of data plane libraries and network interface controller (NIC) drivers. It enables users to offload TCP packet processing from the operating system kernel to processes running in user space. Common vendors for this technology are nVidia with its ConnectX™ and more their BlueField™ product (via nVidia's Mellanox acquisition). In telecommunications, either the labor of handling the physical layer or select layers of the networking stack may be offloaded to a co-processor or "Accelerator Card". Also in telecommunications, offloading may be used in handling data traffic ("Mobile data offloading"), to mitigate network congestion at Base Stations or in wireless network links, to improve thoughput and/or to improve Quality of Service (QoS) of the overall data transport grid. This there is either done by using secondary network paths or switching the transport technique, like moving high bandwidth LTE data dynamically from 3G, 4G or 5G into Wi-Fi infrastructure.

Deduplication

Deduplication is the process of recognizing and eliminating redundant data from a given dataset. The benefits of deduplication are a decrease in occupied storage, reduction of data operational overheads, optimization of free space on a volume, thus lowering overall storage costs. There are two deduplication methods - inline (redundancies are removed as the data is written to storage) and post-processing deduplication (redundancies are removed after the data is written to storage). There are also two types of data deduplication - "file-level deduplication" (comparing a file with copies already stored) and "block-level deduplication" (searching within a file's blocks and saving unique iterations of each block - if a file is altered, only changed data blocks are saved).

Digital separation

in the context of the film industry (cinema motion pictures), "Digital separation" (also "Three-strip Color-separation Film-out Process", or "Three-Strip Separation") describes a photochemical film preservation technique where a color master is separated by color and then recorded on very high resolution highly stable analog black & white roll film. The process is essentially an inversion of what is used in CMYK color printing or the Technicolor three-strip RGB recording format - where components of the color spectrum are recorded individually and are later re-assembled to form the full-color projection image. The guiding idea is that black & white film, similarly as with Microfilm content preservation, is very stable and will remain unchanged over decades. This technique, although financially expensive and technically complex, is a valid option for large motion picture studios to preserve their moving image output. In today's all-digital movie-making workflows, where image capture and projection is mostly fully digital, film preservation is still a challenge, due to data volume and the requirement to preserve digital content reliably. Thus, a film-out process, to an analog time-tested medium, is a valid alternative to LTO tape or similar long-term data archiving solutions.

Direct-attached storage (DAS)

"Direct-attached storage" or "Directly Attached Storage", commonly abbreviated as "DAS", is any digital data storage resource that is directly connected to the system where it is accessed. This includes both internal drives or media (like a SATA HDD, SSD, NVMe, SCSI disks or RAID setups inside a PC or server) and external drives or subsystems connected via interfaces like USB, Thunderbolt, FireWire, or an HBA (Host Bus Adapter). The key key factor in DAS is that the storage is physically attached and accessed directly by a single system, rather than accessed or shared over a network, like NAS (Network Attached Storage) or SAN (Storage Area Network).

Disaggregated Storage

is a layout or structure in IT data storage infrastructure where storage resources are separated (disaggregated) from compute resources. Disaggregated storage allows to scale storage independently from compute, resulting in flexible resource allocation. Disaggregated storage usually relies on conventional hardware-defined nodes/resources, where pooled storage is made accessible for compute nodes over high-speed network connections and switches. The approach allows to optimize hardware for specific tasks and provides easier scaling of compute or storage as individual resources. This can lead to better long-term cost efficiency (TCO, total cost of ownership). The disaggregate "non-converged" approach represents essentially the opposite of the highly integrated layout in a hyper-converged virtualized structure. As of this writing (2025), disaggregated architectures begin to expand beyond just separating storage and compute. GPU resources, as heavily used in AI model training and AI inferencing, can similarly be disaggregated from storage, allowing operators to "build" virtualized GPU compute nodes beyond hardware limitations. Recent approaches disaggregate memory from compute, creating memory pools on decoupled nodes, enabling the allocation of previously unattainable memory sizes to individual server nodes or dynamically adjusting overprovisioned memory resources on client nodes. Although locally attached hardware outperforms such setups, the ease of orchestration in combination with other factors make disaggregate layouts a desirable architecture. And SmartNICs / custom DPUs allow disaggregated resources to communicate over fibre on acceptable latency/throughput levels, minimizing the transport overhead. Cmp. "Converged" or "Hyperconverged infrastructure".

Disaster Recovery

abbreviated as "DR" or more verbose "IT Disaster Recovery" is a subset of "business continuity and disaster recovery" methods and describes a process of maintaining or re-establishing working order in infrastructure or vital (data) systems following a natural, technical or human-induced disaster incident. Schemes like data synchronization points, backups and backup sites are elements of properly implemented, maintained and audited disaster recovery plans. The term "Recovery Time Objective" (RTO) in DR describes the targeted maximum downtime of a system after a disaster incident. The term "Recovery Point Objective" (RPO) describes a maximum time interval in which data may get lost after an incident, for example, when regular backups are made and a more recent backup gets lost, the interval between backups is the "RPO". Lastly, the term "Recovery Time Actual" (RTA) is the tested or audited time it actually takes to recover from a disaster incident and is an important metric to re-evaluate and optimize implemented DR plans.

Distributed file system

One flavor of a "Clustered File System". One example is Carnegie Mellon University's Andrew File System (AFS). Read the article on Clustered file-systems for more.

Drive bay form-factors

On computer systems, hardware options (sometimes internal, sometimes user facing, and not limited to storage drives, but also media card readers, displays, etc.) are usually added in standardized drive bays, where the size of these drive bays evolved from early industry standards over subsequent formalization. Common sizes are 5.25 inch, 3.5-inch, 2.5-inch and 1.8-inch drive bays, where more recent popularity usually aligns with a smaller footprint. 8.0-inch drives emerged on early computers of the 1960s and 1970s, for floppy drives and hard disk drives. One example is the Micropolis 1203-1 rigid disk drive. The 5.25-inch drive bay enveloped followed and within its defined area, drives got smaller and so called "full-height" (abbreviated as "FH", or "FHT") drives appeared, as well as so called "half-height" (abbreviated as "HH", or "HHT") drives, occupying only half of the available height. When storage technology got smaller in size, the 3.5-inch, 2.5-inch and 1.8-inch drive bays became common, and similarly to the larger drive bays, various break-points of only utilizing half-height, half-length and combinations thereof. HHHL (half height, half length), FHHL (full height, half length), sometimes "SL" or "1/3HT", for slimmer drives than half-height 3.5-inch drives. You can find many of these abbreviations for drive-bay form-factors on the Micropolis Hard Disk Drive Support pages.

DMZ

Short for "Demilitarized Zone" is a term in networking architecture meaning ad additional layer or ring of security around a secured network. Between internal networks and external networks, like the Internet, the DMZ acts as an in-between area. Using two firewalls, one for traversal from the public Internet to the DMZ and another from the DMZ to a secured net, a potential attacker has to overcome two security perimeters for intrusion and makes a successful breach of network security less likely.

DP&R

is short for "Data Protection and Recovery", a set of rules, schemes and processes to make sure that data is regularly backed up and archived for the long-term. The aim is to prevent data loss and allow for quick recovery in case of data corruption or accidental deletion.

DPDK

short for "Data Plane Development Kit", an open-source software project that provides network interface controller (NIC) polling-mode drivers and related data plane libraries to facilitate fast packet processing. It allows custom data offloading in order to optimize throughput and latency. Compare Data Offloading or similar project, only for storage, SPDK.

DPU

Short for "Data Processing Unit", a specialized chip similar to a CPU, only optimized for data transport and/or networking workloads (also "smart NIC" or "Accelerator Card"). While network switches and NICs are equipped with custom chips ever since, a new view on networking equipment and/or interface cards as sub-systems of larger infrastructure components brought new terminology and perspectives to the table. When switches and NICs offer the ability to be programmed and this way are able to contribute higher-level operations to facilitate higher troughput or lowered latency, it has become common to speak of DPUs. Such devices are dedicated FPGAs, ASICs or CPU cores that allow custom rewrites, routing or manipulation of traversing data on a NIC or switch to raise overall throughput and bring down latency. Common vendors for this technology are nVidia with its BlueField™ product line (via nVidia's Mellanox acquisition).

eCryptfs

"Enterprise cryptographic filesystem", abbreviated as eCryptfs, is a POSIX-compliant disk encryption software on Linux.

Edge Computing

In complex networks and systems, response times, bandwidth or immediately available compute power are important metrics. Moving resources closer to clients (close to the "edge") improves or bolsters those metrics. As such, "edge computing" is a paradigm in network architecture, distributed systems and part of an overall optimization strategy.

EDSFF

short for "Enterprise and Data Center Standard Form Factor" is a form-factor for Solid-State-Drives (SSDs) and the enterprise / data-center successor to M.2 and U.2. The standard was primarily pushed by Intel and aimed to overcome some of the limitations of the older connection standards, such as limited width (not allowing two chip packages side by side) and power limitations. In 2017, Samsung presented a competing standard called "NGSFF" (Next Generation Small Form Factor). Compare U.2 and M.2.

EFSS

the acronym stands for "Enterprise File Sync and Share" and labels software services that allow employees to securely sync and share files and digital assets within an organization across multiple devices and locations. EFSS solutions provide a secure alternative to public cloud services and thus help prevent the use of unauthorized file-sharing services (shadow IT) while providing employees with a similar feature set and user-friendly interfaces for collaboration and file management. EFSS solutions may be deployed on-premises, in the cloud (SaaS) - or in a hybrid approach, depending on the organization's needs and security requirements.

EncFS

is an open sourced cryptographic filesystem popular on Linux systems. It transparently encrypts files, using an arbitrary directory as storage for the encrypted files and presenting decrypted files via FUSE mounts.

What is Enterprise Storage?

Enterprise storage is a category of data storage systems designed to meet the capacity, performance, reliability, and security needs of larger (business) organizations for managing critical business, internal and employee data. Enterprise storage systems typically fall into four main categories: Direct-Attached Storage (DAS, sometimes called "Server Attached Storage") connects directly to computers or servers. It offers fast local access but as it doesn't offer any facility to share storage resources independently of the host to which it is attached, it offers only limited scalability for distributed/ multi-host environments. Network-Attached Storage (NAS), is similar but operates over a network. NAS is popular for Small Office/Home-Office (SOHO) scenarios and in small and mid-sized businesses. A NAS integrates with the network to provide shared file access and backups, but is usually considered slower than DAS, although that is more a traditional view; contemporary systems typically only struggle with very heavy workloads, large user bases, or higher latencies overall. For higher performance and scalability, organizations often turn to a "Storage Area Network" (SAN) - dedicated, high-speed networks that link storage devices to servers in a many to many relationship via a storage-only network. As SANs use a dedicated separate network, they are secure and performant, but this raises complexity and cost, making SANs a premium fit for large enterprises with demanding data needs. Lastly, "On-Premises Object Storage" is storage following common cloud storage semantics (S3 compatible, etc.) but hosted locally or in private data center. Object Storage excels at handling vast amounts of unstructured data, providing easy access and virtually unlimited scalability.

Edit while Capture

Depending on context, this may also be named "Edit while ingest". In media production workflows, editors are working on video material that came right in and has to be presented to viewers very quickly, for example, in news coverage. On the other hand, actual video material is oftentimes coming in via video (satellite) feeds, dedicated data link transfers or has to be copied from recording media. With large files or with video material being captured from a real-time playback, waiting for the ingest or capture to complete would create a large window of unused time, time that could be used to edit video material already. With file storage solutions tuned for such workflows (and video editing suites supporting these modes), some vendors offer the ability to allow to edit on files that are currently being ingested or captured. Essentially, that means the system is able to use (usually front) portions of a file which is still growing on the file-system due to incoming data being appended on the file's tail.

EtherDrive

is a brand name of The Brantley Coile Company formerly Coraid, for storage area network devices based upon the ATA over Ethernet.

Ethernet

Ethernet is a set of networking technologies, standards, and protocols applied together to provide stable, fast, and secure wired data transmission in a local area network (LAN) and metropolitan area networks (MAN). It is defined and described by the Institute of Electrical and Electronics Engineers (IEEE) standard 802.3 and operates in physical and data link layers of the Open Systems Interconnection (OSI) model. Nodes in Ethernet networks use twisted-pair copper and fiber optic cables as medium.

Erasure Coding

an erasure code, in coding theory, is a "forward error correction" (FEC) code. In storage, the popular Reed-Solomon (RS) group of codes is used to form smaller derrivative data that represents the larger actual data, instead of full redundancy, as erasure codes can be used to rebuild the original actual data from its derivates. Today (as of 2024) the practice of using erasure codes ("erasure coding") is an important building block of large scale computer data storage and RS codes are built into many software storage implementations, like Linux' RAID 6 or Apache Hadoop's HDFS. "6+3 erasure coding" is a popular choice, meaning that for any 6 blocks of data, the system computes 3 derived parity blocks which are stored in addition to the original data to allow data reconstruction in case of partial data loss.

Error correction code (ECC)

also "error-detecting code", is one measure in "forward error correction" (FEC), a technique often used in any area where (digital) data is stored or transmitted to detect errors ("error control"). The basic idea is that added data, as redundant data, is added or calculated to know of errors and be able to correct them on-the-fly, to a certain degree.

eSATA

is a standard for external connectivity of SATA devices via eSATA ports and eSATA connectors. It was formalized in 2004. eSATA is not to be confused with "SATAe" (short for "SATA Express"). "eSATAp" describes powered eSATA, aka "Power over eSATA".

ESDI

is short for "Enhanced Small Disk Interface" and an obsolete successor of the ST-412/506, and predecessor to SCSI.

ETL Process

ETL is short for "Extract, Transform, Load", a term from the field of database theory, data mining and data wrangling, describing the process of gathering data from multiple, potentially unstructured or disparately structured data sources and combining them into one unified database.

exFAT

Short for "Extensible File Allocation Table" is a Microsoft file-system from the FAT family of file-systems and is optimized for flash memory, thus often used on USB sticks and the like.

EXR (Extended Dynamic Range)

Colloquial abbreviation of the OpenEXR image format, a high-dynamic range, multi-channel raster file format developed by Industrial Light & Magic (ILM) for professional film production pipelines, with beginnings in 1999 and a later release under a free software license.

Extent

In regards to file-systems, an Extent is an extension or update of the traditional "block mapping" data storage scheme. While data in "block mapping" is stored as a number of individual blocks possibly spread randomly across the available storage surface, an Extent is a continuous range of usually neighbored blocks. In comparison with individual blocks, which are addressed and accessed via their individual numbers, the access addressing of an Extent can be stored as a number range. As an example, it is more efficient to store "blocks 3 .. 9" than enumerating "blocks 3,4,5,6,7,8,9". Extent based file-systems usually employ means to mitigate fragmentation, like "Copy-on-Write" (CoW) or "Allocate-on-Flush" (AoF). Btrfs is an example of an Extent-based copy-on-write (COW) file system, Linux' Ext4 file-system can be configured to use Extents.

Extended file attributes

Also brief "Extended attributes" or "xattr", "xattribs" on Unix-like operating systems, is a feature of some file systems where a file can be associated with additional metadata that is not strictly required by file-system in POSIX like operation. Extended attributes commonly are organized in user-definable key-value pairs. Their size (data payload storable in an xattr) is usually very limited in comparison to the possible size of the related file, but with common sizes around 255 bytes and 4KiB usually large enough to fit textual metadata. The actual size limitation depends on the type of used file-system. Extended attributes are commonly used to store more elaborate file access rules, like a per-file access-control list (ACL) or Dublin Core syntax to describe resources like Image or Video files. While xattribs would be perfectly capable of storing the "file coloring" found in some file managers an in Apple products, this color metadata is mostly stored in a separate database or in sidecar files, contributing to the problem of separating metadata descriptors from the described resource. Apple file system usually store extended attributes and file metadata in so called resource forks, separate of a files payload data.

FAT

The File Allocation Table (FAT) is an older file system developed by Microsoft initially for storing and retrieving data on small-size media, such as floppy disks. Over time, it evolved and became used on hard disks. Data in this file system is organized in a tree-like hierarchical structure within directories. Files are stored on the media in collections of data blocks called clusters, with each cluster having a size of 4 kilobytes in the last version of FAT (FAT32). Among limitations of the FAT32 file system are limitations related to the maximum file and volume size, as FAT32 cannot store files larger than 4 gigabytes, and the volume size is 2 terabytes maximum. Today, FAT32 continues to find use in USB flash drives and memory cards. One newer iteration of the FAT file-system family is exFAT, optimized for flash memory.

Fibre channel (FC)

High-speed data transfer protocol providing in-order, lossless delivery of raw block data. As such it is an elemental technology in SAN and Block Storage. Fibre Channel uses its own network interconnection model, defining layers similar to the OSI or TCP/IP models.

FC-4 (Protocol Mapping Layer: application protocols, for example encapsulated SCSI)
FC-3 (Common Services Layer: optional services, like RAID coordination or encryption)
FC-2 (Network Layer: FC-P's core)
FC-1 (Data Link Layer: on the wire coding, etc.)
FC-0 (Physical Layer: lowest layer, cables, connectors

Fibre Channel over IP (FCIP)

Also "FC/IP", is a technology to transport Fibre Channel communication over an IP connection ("Fibre Channel tunneling", also "storage tunneling"), commonly used to send FC data over distances normally not possible with native FC. FCIP is not "Internet Fibre Channel Protocol" (iFCP).

Fibre Channel over Ethernet (FCoE)

encapsulates FC frames for transport over Ethernet networks. This is commonly used to enable deployments to use Fibre Channel on an Ethernet infrastructure (usually 10 Gigabit (or more) Ethernet networks).

File Level storage (vs. block level storage)

File-level storage, also known as file-based storage, is a data storage method that deploys a hierarchical architecture for storing and organizing unstructured data through files and directories. In the context of file-level access protocols, it uses a Network File System (NFS) for Linux and Unix-based operating systems and the Common Internet File System (CIFS)/Server Message Block (SMB) for Windows. When it comes to scalability, file-based storage is able to scale up (expanding the file storage capacity by adding more storage resources to a single node) and scale out (increasing the performance and capacity simultaneously by adding more file storage nodes). Compared to block-level storage, file-level storage is cheaper, simpler, and easier to manage, but block-level storage is the preferred choice for performance-critical applications due to its direct access to raw storage blocks. File-level storage is commonly deployed in network-attached storage (NAS) and storage area networks (SAN) - if configured with file-level access protocols.

File System

A file system is a collection of methods and structures designed for operating systems and various software applications to name, store, access, retrieve, and organize data on storage devices. This is (traditionally) achieved through a hierarchical arrangement of files into directories and subdirectories. Some file-systems use a non-hierarchical approach (e.g. Semantic File Systems), but are either niche or offer non-hierarchical access mechanism on top of traditional hierarchical structures. Additional responsibilities of file-systems include access control, metadata management, encryption, directory structure maintenance, monitoring available free space, enforcement of set per-user quotas, optimizing performance and ensuring data integrity.

File Storage

a computer "file" is a kind of resource that computers and storage systems use to organize data in, usually addressing this data by a "filename". Historically, this is one of many desktop, office and work-environment metaphors found in computer systems and information retrieval. When computers were in their infancy, data used to be "written" (punched) on punchcards, and these cards were commonly organized in file (or filing) cabinets, just like traditional paper files in any office. When data started to be written onto magnetic storage media, the term stuck and coherent data units were still referred to as being "files" and the storage layouts they were stored in, either linearly or hierarchically, were then labeled "file systems". Today, when we speak of "file storage", it is usually to distinguish this traditional perspective on data storage with other techniques or variants of data storage or its organization, for example to contrast "file storage" against low-level "block storage", or against schemes where data isn't accessed via its filename but instead by its contents, as in some "object storage" systems.

Fileserver

A file server is a computer that acts as a centralized storage, providing data sharing across an organization or network. Its primary responsibility is storing and managing data, enabling other computer clients on the same network to access them. In a file server setup, users interact with a central storage, which serves as a platform for storing, retrieving and managing internal data. The benefits of utilizing a file server include remote access, centralized management, enhanced security, backup capabilities, data recovery options and user control. Private file servers can only be accessed through an organizations's intranet or through virtual private network (VPN), just a well might file servers be accessible to the public, via FTP or the Internet.

Flash file system

a specific, or a category of, file system where data access is organized in such a way that it is optimized for flash memory devices. One example of optimization is specific strategies to mitigate "wear" (wear levelling), as flash memory is usually only allows a limited number of erase cycles before becoming unreliable.

Floppy disk

A floppy disk is a legacy removable, flexible (thus "floppy"), magnetic storage medium enclosed in a plastic envelope. A magnetic head was used in a stepper-motor driven reading and writing mechanism. On disk, the head wrote usually 40-80 tracks, divided into data sectors. Early floppy disks were 8-inch disks. Floppies garnered wide popularity in a sized down version during the 1980s, with a smaller sleeve measuring 5.25 inches. The flexible plastic envelope was changed to a more sturdy hard-plastic shell when disks grew in capacity to a common 1.44 megabytes on a once again down-sized disk of 3.5 inches during the 1990s. A "floppy disk drive" (FDD) is an obsolete hardware device designed for reading data from and writing data to floppy disks. Built into external enclosures, an FDD was one common peripheral component of early personal computer configurations. Internally, the drive mechanics usually had a power and data interface, with 4-pin power cable and a 34-pin box connector, matching a flat ribbon cable to an FDD controller.

Micropolis was a major manufacturer of 5.25" form-factor Shugart interfaced Floppy Disk Drives during the 1980s and evolved into manufacturing "rigid" disk, hard disk drives from this expertise in the 1990s.

Floppy or hard disk, disk tracks, disk sectors, disk clusters or blocks, disk cylinders — Track layout on floppy disk and hard disk: tracks divided into sectors, multiple sectors forming one block (Unix/Linux) or one cluster (Windows). The same track on multiple data surfaces, upper and lower side of a floppy or across multiple hard disk platters, is one cylinder.

FTP

FTP is the abbreviation of "File Transfer Protocol" and describes a network protocol designed for file transmission between computers using a Transmission Control Protocol/Internet Protocol (TCP/IP) connection, where one computer acts as a server and another as client. It resides in the seventh layer of the open system interconnection (OSI) model - called the application layer. Also, there are secure alternatives like File Transfer Protocol Secure (FTPS) or Secure File Transfer Protocol (SFTP, sometimes "SSH File Transfer Protocol", as it operates over an SSH connection), where both commands and data transmission are encrypted in order to overcome the unencrypted nature of traditional FTP.

File virtualization

In file area network (FAN) and network file management (NFM) contexts, a virtualized file is a client representation of a computer file that preserves the traditional file access semantics of paths and hierarchies while at the same time separating the presented file from the actual underlying storage environment, by inserting an abstraction layer between the client and file server, NAS or storage technology.

Filesystem in Userspace (FUSE)

FUSE (derived from Filesystem in Userspace) is a framework designed to provide an interface for userspace programs, allowing them to export a file system to the operating system kernel. It consists of a kernel module, a userspace library (libfuse), and a mount utility (fusermount). The most important capability lies in enabling secure, non-privileged mounts of filesystems on a per-user basis. FUSE has been adapted for deployment on Solaris and Unix-like operating systems, including Linux, macOS, FreeBSD, and NetBSD.

GIO

("Gnome Input/Output") is a library, designed to present programmers with a modern and usable interface to Linux' virtual file system.

Global file systems

a category of clustered, distributed file systems that appear as one uniform file-system hierarchy ("global namespace") to clients.

Gopher

Early Internet Protocol, related to FTP and a predecessor of the World Wide Web (WWW). User could search, browse, access and distribute their own documents documents across a global namespace resembling a local file-system hierarchy, the "gopher space".

GPFS

Abandoned name for IBM's GPFS (General Parallel File System) now brand-named as "IBM Storage Scale" and previously "IBM Spectrum Scale". A high-performance clustered file system software developed by IBM (a Parallel Filesystem).

GPU

Short for "Graphics Processing Unit", a specialised chip similar to a CPU, only optimized for graphics workloads.

Grid file systems

is a broad term for a category of clustered, distributed file systems, meaning data is spread over a "grid" (instead of a single disk or single storage location) to improve reliability and availability and at the same time appearing as a single uniform resource to client systems.

GVfs

found in Linux operating systems and usually the abbreviation for "GNOME virtual file system"). GVfs is GNOME's userspace virtual filesystem (VFS) designed to work with the I/O abstraction of GIO. Wikipedia.

GSS (GPFS Storage Server)

An IBM product, a modularized storage system. GSS 2.0 is based on an IBM System x x3650 M4 server running General Parallel File System (GPFS) 4.1 on NetApp disk enclosures. As of 2024, version 2.0 is now part of the IBM Elastic Storage Server family.

Hadoop (HDFS)

Apache Hadoop is a general framework that allows for the distributed processing of large data sets. One component is the Hadoop file system (HDFS). Official website

Half-rack format

Comparable to 19-inch rack frame enclosures, a half-rack, 10-inch, or 9.5" rack only uses half the width of a standard 19" rack, thus offering a modular mounting option on a smaller footprint.

HAMR drive

Short for "Heat-assisted magnetic recording", pronounced "hammer", is a technology in magnetic hard disk drives where heat is used to lower the magnetic coercivity of magnetic disk coatings in order to allow the read/write-head to magnetize particular domains on the rotating drive platter. Technology used to heat the drive's platter surface ranges from Microwaves (MAMR) to laser and surface plasmons. The used heating technology usually treats a domain immediately before the read/write-head moves over, a process in the region of under one nanosecond. The material is heated, written and then cools down by its own. As Corercivity quickly raises right after data is ingrained, the magnetization appears as being "baked" into the magnetic surface.

Our Micropolis knowledge database has a more detailed article on HAMR Disk Drives.

HBA (Host Bus Adapter)

Piece of computer hardware, also called host controller or host adapter. While host adapters can as well be devices to connect USB or FireWire, the term HBA is mostly used when referring to SCSI, SATA, SAS, NVMe or Fibre Channel - newer transferring mechanisms, as host adapters were a common form of connecting early SCSI drives. While early models implemented basic I/O for the host system, later models added hardware fault tolerance schemes like RAID. Host Bus Adapter In Fibre Channel context the term used is usually High Bandwidth Adapter (HBA). InfiniBand controllers are commonly called host channel adapter (HCA).

HDD

A hard disk drive (HDD), commonly hard disk or hard drive is a computer storage device that uses electro-mechanical means to store digital data on a rotating magnetizable disk, the hard disk. As the platters in a HDD are made of metal with a surface coating, they were initially also called fixed disk drives or rigid disk drives. In a Hard Disk Drive, a rotating platter similar to a audio record player is used to write and read back digital data. A stylus or read/write-head is moved over the surface of the rotating media to write concentric tracks of serial data. In contrast to an audio turntable, the information is not written in one long spiral form but on parallel concentric tracks, also called cylinders. Either embedded information in a track or positioning information along a track is used to determine the rotational position of the read-write head on one of these tracks. Each track this way is separated in data areas called sectors. The rigid disk turns at a defined speed and fast actuators move the read/write-head on an arm between tracks to access data in a near-random fashion. While the disk is made from metal, the actual surface is coated with a special magnetizable coating. The read/write-head used a magnetic force to magnetize regions, called domains, in a predefined way. The magnetic force exerted by the read/write-head has to be high enough to be able to overcome the Coercivity of the used coating. As of the 2020 years, data density in hard disk drives has become very high, requiring magnetic coating with a very high Coercivity. If Coercivity wasn't so high, individual domains would "bleed" into each other, blurring the boundaries between magnetized areas. This led to read/write heads being developed that heat a particular region on the disk surface to allow material be magnetized at all (as found in HAMR and MAMR HD Drives). Once the surface has cooled down again, magnetic data appears to be "baked into" the magnetic surface.

Micropolis was a major hard-drive manufacturer during the 1990s. As an early manufacturer of Floppy Disk Drives, the company used controller expertise and IP to offer early Hard Disk Drives - essentially floppy disk drives with a rigid disk. Prior to the market's swing to smaller HDD form-factors, Micropolis was offering the highest capacity 5.25" HDD with its model 1991 disk drive.

Platters and r/w-heads of an opened hard-disk-drive — Opened Micropolis 1325 hard-disk-drive, showing data surfaces (platters) and r/w-heads

HDFS

is shot for "Hadoop Distributed File System", the file-system component of the Apache Hadoop distributed processing framework. HDFS is a high-availability clustered/ distributed system based on storage nodes, with some nodes acting as master nodes (NameNode, directing metadata) and others as worker nodes (for storage). HDFS allows storing very large sets of data and is usually optimized for large files, with a general suggestion to assemble smaller files into Hadoop Archives (HAR). Alternatively, the Hadoop Distributed Data Store (HDDS) layer offers optimizations for smaller files.

HFS+

also "HFS Plus" and sometimes "Mac OS Extended" or "HFS Extended" is a file system developed by Apple Inc. and was introduced in 1998 with the release of Mac OS 8.1. HFS+ addressed several issues with its predecessor HFS (also known as "Mac OS Standard" or "HFS Standard"), such as support for large files, Unicode file naming, and overall improved performance. HFS+ uses journaling to maintain data integrity and prevent corruption. It was the default file system for macOS devices until the introduction of APFS in 2016.

Hierarchical file system

One and the most common form of computer file systems. Data objects are regarded as files and get stored in collections (of items), also called folders (of files) or directories (with entries). The sum of all files present in a file system this way resembles a hierarchy of directories and files, a directory-tree. This structure can be traversed by the user via command-line tools and graphical file-managers, file-browsers. One file mostly has one single path graph representing its "location" within the file hierarchy and thus "on disk". Some filesystems implement more data object categories (special files), hard-links, soft-links, junction-points, symbolic links, sockets, fifos, etc. Hierarchical file systems are just one form. Older media (punched cards) or tape media used linear file systems. Newer approaches, in parts breaking with common usage patterns, are tag based file systems (semantic file systems, associative file systems).

High availability (HA)

Availability is a characteristic of computer systems. It measures a level of operational performance, in regards to general availability, performance or simple uptime. In case a system is aiming at or defined for a higher than normal availability, it is commonly referred to High Availability (HA). As availability of a system is usually measured in percentages ending with multiple "nines", it is common to speak of the "number of nines" in enterprise computing, to describe capabilities of mainframes or similar IT or an agreed upon minimum availability defined in Service Level Agreements (SLA). The term "always on" and Resilience are related. Resilience describes a system's tolerance against faults and challenges during normal operation.

Hot plug

The process of "hot plugging" means removal and/or insertion of a system-option during normal operation of the host system. For example, if a storage array is exhibiting the trait of offering "hot pluggable SSDs", it means individual disks may be pulled out during normal operation, for example as part of maintenance or to recover from a device's failure - and the host system will continue running uninterruptedly . Hot plug is a feature on the device-level, the hardware-interface level, but must just as well be supported by the storage controller and the running storage software.

Hybrid archiving

describes the practice of keeping an analog/original version of some archival matter and a digital/digitized copy of it. For example, newspaper pages have been photographically preserved in microform format on microfilm - but at the same time a digital copy, like a scan or digital photographic reproduction of either the original newspaper or its microfilm representation is kept on file for long time preservation.

Hybrid Cloud

virtualized IT resources may be deployed on-premise in a "Private Cloud" or bought as a service from an external Cloud Service Provider (CSP), in a Software-as-a-Service (SaaS), Storage-as-a-Service (STaaS) or whatever as a service (XaaS) model. A "hybrid cloud" describes an IT infrastructure that is composed of on-premise, local resources and remote off-site "cloud" resources. In building a "Hybrid Cloud", organizations combine private/owned resources with externally bought resources for an optimal solution. Hybrid Clouds may be selected to improve fault tolerance, lower costs or to get access to services the private cloud is unable to provide. While favorable in select use cases, a hybrid cloud also brings security challenges and a potentially raised administration overhead. For the specific use-case in data storage, described as "hybrid cloud storage", the term refers to a storage infrastructure that combines on-premises storage resources with remote resources, usually in the form of resources offered by a public cloud storage provider. Hybrid clouds are popular in backup schemes or in applications where partial data sets are made local for performance reasons, while a larger data volume is kept off-site (cmp. "Nearline storage").

Hyper-converged infrastructure (HCI)

Hyper-converged infrastructure in data center and storage layout structures describes fully virtualized computer infrastructure and is more or less the opposite of more traditional "hardware-defined" system layouts (cmp. Disaggregated Storage, "Non-Converged"). The elements in a HCI layout are usually virtualized computing, software-defined storage (SDS) and software-defined networking (SDN). While "Converged" infrastructure similarly relies on highly abstracted, virtualized and manageable resources on nodes that combine multiple roles, "Hyper-convergence" takes this concept further by utilizing a pool of standardized, similar machines, each equipped with similar network interfaces and direct-attached storage. This approach allows data center operators to reduce total cost of ownership (TCO) through the use of commercial off-the-shelf (COTS) servers.

HyperSCSI

is a failed attempt to operate native SCSI protocol commands over Ethernet, similar to Fibre Channel. It bypassed and skipped a number of elements of the TCP/IP stack to lower overhead and perform more like a native SAN protocol. It was replaced by iSCSI and Fibre Channel.

ILM

short for "Information Lifecycle Management", label for a comprehensive approach of (or "view on") managing the flow of an information system's data and associated metadata from creation to storage to deletion. ILM's strategies try to streamline and structure the whole data life cycle, leading to improved operational efficiency (cost), better alignment with, for example, legal, regulatory or organizational policies (compliance) and a reduction in risk by elevating data security. Under ILM, created data is classified and treated based on a number of defined guidelines that help maintain consistent standards across the active use of data (backups, encryption, confidentiality, for example) and when data has become obsolete (long term storage, safe deletion, destruction). ILM must not be confused with the Hollywood Visual Effects Company Industrial Light & magic (ILM).

Interconnection models

in computer networking data networks can be described as having a specific layer architecture, where each layer represents one level of abstraction. The lowest level is usually the physical layer, the cable layer, while a higher level describes what actually travels over the wire. A number of so called interconnection "reference models" have been defined and standardized. The ISO/OSI model (short for "Open Systems Interconnection" model)" is one, the TCP/IP model ("Transmission Control Protocol/ Internet Protocol") is another well known reference model. But there are more, like the TCP/IP predecessor DoD model (developed by the United States Department of Defense, "DoD") or the Fibre Channel ("FC") model of layer abstraction.

Aligned comparison of OSI, TCP/IP and Fibre Channel interconnection models

OSI Layer	OSI Name	TCP/IP	Fibre Channel
5-7	Application	HTTP, Telnet, FTP, SCSI-3 over TCP/IP (iSCSI)	IP, SCSI-3 FC-P
4	Transport	TCP, UDP, SCTP, TLS	FC-4
3	Network	IP (IPv4, IPv6)), ICMP, IGMP	FC-3
2	Data Link	Ethernet, Token Ring, Token Bus, FDDI	FC-1, FC-2
1	Physical	media	FC-0

IPMI

short for "Intelligent Platform Management Interface (IPMI)" is a suite of interfaces used to remotely monitor a host system. IPMI is an out-of-band (meaning a separate system) administration tool. In IPMI a "baseboard management controller" (BMC) is a complete but separate system attached to a host, a microcontroller or low-spec computer system, that acts as a controlling instance of the monitored host (especially servers), offers communication, own serial and network ports, etc.

ISV Certified

"ISV" is the abbreviation of "Independent Software Vendor", a broad term to describe the sum of software vendors who are "independent" of the currently dominating vendors of software on the market. As such, basically the majority of the software mnarket vendors are ISVs. The term is commonly used in conjunction with hardware vendors, who build or manufacture systems that were then "certified" by ISV to be compatible and performing to a certain spec when executing specific software, usually in specific markets, for example for CAD applications, for medical work, high performance video editing, etc. ISV Certified is not some form of official or independently issued certificate, but part of programmes certain vendors implement according to proprietary terms. It is common to have high performance deskside workstation PCs, or rack mount workstations, as used in A/V editing or 3D visualization studios, be ISV certified.

IOPS

IOPS is an abbreviation for "Input/Output Operations Per Second" and represents a critical metric for storage system performance. It provides information on how many input and output operations a storage device can handle per second. IOPS serves as an indicator of storage efficiency and responsiveness, especially in situations where high operation rates are crucial - like in virtualization environments, video streaming, computer gaming, cloud computing, etc. As a performance metric, IOPS is usually measured in "total IOPS", "Read IOPS" and "Write IOPS".

iSCSI

"Internet Small Computer Systems Interface" is a method to transport native SCSI commands over TCP/IP network infrastructure, more specifically iSCSI is SCSI-3 over TCP/IP. iSCSI is Ethernet based while Fibre Channel, in comparison, uses its own interconnection model architecture.

IMF ("Interoperable Master Format")

The IMF (Interoperable Master Format) is a standardized and interchangeable master file format developed by the Society of Motion Picture and Television Engineers (SMPTE) to address the entire process of creating, managing, storing, and delivering professional video and audio content. It is usually used in professional A/V workflows to deliver finished works, offering support for multiple language streams, subtitles/captions, etc. ST 2067 is a standard published by SMPTE to provide a comprehensive set of specifications for the inter-operable master format, enabling the creation, localization, distribution, and storage of master video and audio content on various platforms. IMF stems from other file-based professional audio/video formats like MXF.

iWARP

in a clear play on the hyperspace speed metric used in the Star Trek feature-film and TV franchise, iWARP (not an acronym for anything) is a computer networking protocol used in "remote direct memory access" (RDMA) to transport RDMA traffic over TCP/IP networking infrastructure. One implementer is Chelsio.

JBOD (Just a Bunch Of Disks)

is a term to differentiate a number of independently operated disks, an Array of Independent Disks, where no means to achieve data redundancy are used. As such, it is not a RAID system. JBODs are usually disks in a single enclosure. JBODs can be used to support or amend existing RAID systems. A Volume Management Software can be used to connect multiple volumes as one single logical volume, or individual volumes from a JBOD may be separated into logical volumes that are in turn used to assemble, concatenate or extend other volumes. Storage volumes operated in a computer without the volumes acting as fault mitigating volumes can be described as being a JBOD.

JBoF (Just a Bunch Of Flash)

or sometimes "Just a Bunch of Flash Drives" or "Just a Bunch of Flash Memory" is a specialized version of a JBOD, meaning that a storage array contains flash drives (SSDs) instead of traditional hard disk drives ("spinning media"). JBoFs usually align better with newer high speed access semantics. The keyword "JBOF" is often used in conversations to describe Flash-first technology or deployments.

Journaling file system

A type of file-system that offers atomicity and durability by keeping a log of (at least) metadata changes / transactions, by employing a tracking file called a journal, which serves as a transaction log. In case of an unexpected shutdown or system failure, the operating system will use the journal during the reboot process to repair any inconsistencies in file indexes. Log mechanisms are usually implemented as a "write-ahead log" (WAL). Journaling is used to reliably recover file-system integrity after unexpected interruptions, like a power loss or hardware failure.

Key-Value store

a simple version of a database that only offers to store values for a given key, similar to hashes in Perl or named arrays in JavaScript, usually under a flat namespace and not queryable with a sophisticated syntax or interface. It is a simpler concept in comparison with a full relational database and is commonly chosen for speed, ease of use or because of a lower implementation burden. Due to its diametrical difference to relational databases the concept is commonly called "NoSQL". MongoDB, Redis and memcached are well known offerings. Many implementations differ in terms of data persistence. RocksDB developed at Facebook is an example of an embeddable, persistent key-value store. Amazon DynamoDB is a Key-Value store SaaS offering from Amazon's AWS unit.

LAN (Local Area Network)

A computer network that can be described as being smaller than Metropolitan or Wide Area Networks, but being larger than Personal Area Networks ad-hoc networks. LAN usually refers to the type of physical wired networking infrastructure found in home or company networks. Hubs, Routers and Switches are used to build the network topology. On the wire the Ethernet protocol is used, with speeds between 1 and 100Mbit/s. Faster networking speeds usually don't use copper but optical fiber (e.g. 100-Gigabit-Ethernet). The wireless variant of a LAN is typically called a WiFi network, WLAN or Wireless Network.

Latency

In a data storage context, latency measures the time delay between initiating a request and completing the corresponding input-output operation. It is an important performance metric, commonly expressed in milliseconds, providing a value to compare a storage's overall efficiency and effectiveness. Reducing latency enhances performance and can be achieved by optimizing storage architectures, employing caching mechanisms, and using high-performance interfaces.

LDAP

LDAP, short for "Lightweight Directory Access Protocol", is a standard application protocol developed to store and provide access to network resources, including users and devices. It serves as a centralized directory service commonly employed for authentication, authorization, and information or device lookup within a network. Often used in "Single Sign On" features or organizations wide authentication, the advantages of the LDAP protocol are usually reflected in enhancing overall security, simplifying user management and providing human clients with easier access to network resources.

LDI

short for "Library/Drive Interface" (or "Library to Drive Interface", sometimes erroneously "Library Device Interface") is a protocol and connector found on magnetic data tape drives, specifically LTO tape drives for use in tape libraries. The LDI bus, outlined by IBM in the early 2000s, is based on RS-422 and is used to communicate with a tape drive device for maintenance and operation control, using an interface that is separate from the SAS data bus (out of band). LDI handles drive presence, operation modes, serial number, drive status, errors, cartridge insertion/ejection, etc. The interface standard RS-422 has a long tradition of being used for tape recorder control in broadcast and video editing, and it can be assumed that this is why IBM chose it for the integration of data tape drives in tape libraries. The 10-pin LDI Connector is usually found on the far right side of a drive's rear side (seen from behind). Some newer LTO drives also feature an Ethernet Port through a non-standard 13-pin one-row box connector with side-groves. This "Ethernet Service Port" usually supports a subset of the FTP protocol and is commonly used to transfer firmware or dump drive service data. The Ethernet connector is usually placed somewhere between the SAS and LDI connectors.

Linstor

is a fast distributed block storage management solution for machine clusters, It's not a file-system but more a Linux based block storage management suite. Linstor is offered by Austrian software company Linbit. It is based on Linbit's own open-sourced DRBD ("Distributed Replicated Block Device", or "Distributed Replicated Storage System") framework. Linstor and DRBD offers flexible handling of VM images, docker build spaces, databases etc. with minimal performance impact.

Local Block Storage

Traditionally, in non-virtualized environments, local block storage refers to physical memory-based (e.g. SSDs) or Hard Disk Drives directly connected to a host system. In a tiered perspective on storage, this is usually the fastest (bandwidth, latency) storage available on a system after RAM and other memory based systems. In virtualized environments, in enterprise IT, datacenters, web hosting or with cloud service providers, it describes virtualized storage volumes that can be treated like physical storage devices, can be formatted with any given file-system, partitioned, mounted, etc. More colloquial, for example in web hosting product descriptions, "local block storage" (meaning virtualized storage), refers to optional storage that can be attached quickly and on-demand to (virtual) servers.

Log-structured file system

is a type of file-system that writes data and metadata sequentially to a circular log (or "circular buffer"). John K. Ousterhout and Fred Douglis proposed this layout in 1988 under the assumption that future storage systems would be less reliant on relatively slow seeks and instead rely on memory cache for reads and making writes the bottleneck. This has since been proven as true in parts yet brought a number of other drawbacks.

LTFS (Linear Tape File System)

a file-system designed for use on tape media, allowing clients to access files on linear tape similar as to file on random access media. Originally developed by IBM, it has become an open standard (ISO/IEC 20919:2021) and was moved to be managed by the LTO Consortium in 2010.

LTO

"LTO", an abbreviation of "Linear Tape Open", is a specification for half-inch magnetic tape, its cartridges and matching tape drives. LTO-1 was introduced in 2000 and was able to store 100GB of data. Since then, the LTO format, in various revisions (LTO-2, LTO-3, etc.), has been dominating the market of so called "super tape" formats, data storage tapes of very high capacity. Since version LTO-5, data on LTO tapes is usually stored using the "Linear Tape File System" (LTFS).

**LTO Tape Cartridge** (LTO-6 with 2.5TB native capacity, storing up to 6.25TB compressed)

Lustre

Lustre (the word being a combination of "Linux" and "Cluster") is a parallel distributed file system popular in large-scale cluster computing and High Performance Computing (HPC). Being distributed by design, Lustre separates into Metadata Servers (MDS), Object Storage Servers (OSS), Object Storage Targets (OST), Lustre's distributed lock manager (LDLM) and clients. Nodes are connected via the Lustre Network (LNet) using several technologies, like InfiniBand, TCP/IPoE etc. OST and MDT storage devices may be hardware raid or JBOD, with ext4, ZFS or Lustres native block device layer.

LUKS

is an abbreviation of "Linux Unified Key Setup" and is a platform-independent standard for disk encryption, using a standardized on-disk format to facilitate compatibility among distributions, enable secure management of multiple user passwords. It originated on Linux and has since its inception found adoption on more platforms.

M.2 (NGFF)

also known by the older name "Next Generation Form Factor" (NGFF), is a specification for computer expansion cards and associated connectors and evolved from the SATA standard. It is a replacement for the slightly older mSATA standard and caters specifically to solid-state storage applications and applications where physical space is limited. M.2 can't be hot-swapped and operates strictly at 3.3Volts. M.2 cards connect directly to the motherboard on the matching M.2 slot, offering connection for a range of interfaces, including SATA, PCIe and NVMe. Also compare similar standard "U.2", which caters to data-center applications, only supporting NVMe and a slightly larger form-factor for better thermal characteristics.

Massive array of idle drives (MAID)

Assemblies for or in general the architecture of using hundreds or thousands of hard disk drives which are mainly spun down. MAIDs are one form of "cold storage" and are used to facilitate "nearline storage" of data. MAIDs (massive array of idle drives) are an answer to data that is "Write Once, Read Occasionally" (WORO).

MAM (Media Asset Management)

alternative term for "Digital Asset Management" (DAM). MAM software suites usually focus on enterprise-wide management of assets, to help organizations with the challenge of centrally hosting and managing a multitude of digital assets. This can be advertising material or corporate visuals, logos or branded graphics, sound bits, videos, material distributed to customers etc. MAMs usually offer some kind of approval systems, so new assets can be (peer) reviewed by entitled personnel before more general users of the MAM can use these assets in their daily work. The broadcast, television and movie industries are related markets for MAMs, but in this sector, Asset management Systems are usually tailored to the specific needs of creative, post-production or production workflows. In this field, MAMs are sometimes referred to as "Production Asset Management" (PAM) systems, offering "Bin Locking", "Edit while Ingest".

MAMR

"MAMR" is short for "Microwave Assisted Magnetic Recording" one type of "HAMR" hard disk drive for very high density magnetic storage.

Managed Service Provider (MSP)

is a broad term describing IT solution providers who offer not only building-blocks of IT (here "computer cloud services"), but also offer the professional operation of such technology (actively managed) as their product. A basic example is managed web hosting, where root server products are sold to customers, but DevOps and regular chores, e.g. updates, are handled by the vendor, usually under some Service Level Agreement (SLA). Cloud Managed Services Provider (CMSP) is a term for mixed solution providers that offer to operate basic cloud services for customers as their service. Some Service providers focus on operating IT from a single vendor (pure play MSPs) while others mix and match technology from multiple vendors to fulfill customer requirements (multi vendor MSPs).

What is a Media Defect Table?

Older drives, like the Micropolis 1325, have a table sheet printed on one side of the enclosure. It says "Media Defect Table" and lists various numbers, cylinders, etc. This table denotes areas on the physical magnetizable surface of the drive's internal platters that an automated test had earlier discovered as faulty after production at factory. The drive will not be able to store data in these areas. Thus, the end user is required to manually enter the areas listed in this defect table into their storage driver, so the system using the drive will not try to store data at these locations.

Source: Our article "What is a Media Defect Table" from the Micropolis support knowledge database.

Metadata

Metadata generally means an information set used to describe other data, thus offering contextual insights into specific data. It provides information such as the origin of data, date of creation or modification, location, ownership, size, purpose, etc. There are 6 distinctive types of metadata: structural, descriptive, preservational, administrative, provenance and definitional. Metadata simplifies search and retrieval processes. Metadata helps organize and categorize data, making data easier to manage and maintain.

Metadata controller

in clustered or distributes file systems, a "metadata controller" (or "metadata server", "director", etc.) is a system or daemon acting as a director for all requests asking for metadata about a file within the file-system or the allocation/retrieval layer of (global) file-systems. Its implementation and exact definition of what a Metadata Controller does depends highly on the implementing file-system or data storage system. Some file systems separate metadata from data payload requests to split actual data retrieval for theses tasks between separate systems for performance reasons. A metadata directing layer may sit between client and data and may impose a complication or bottleneck for performance, thus many implementations sideline the actual read of payload data from metadata operations. Sometimes the separation of such elements is configurable, sometimes storage solutions separate data payload and metadata by design. With distributed or clustered file-systems, a metadata controller usually also fulfills storage allocation duties (write) and aids in fulfilling read requests by directing a reading instance to where data, on which machine or in which physical location, is actually stored. Examples of metadata controllers are Quantum's StoreNext file-system component "Metadata Controller System" (MDC system) as part of its optional separation of split payload data, metadata and journaling; or CephFS's "Metadata Server" (MDS) which keeps metadata about files in CephFS and also caches hot metadata requests, manages spare metadata service instances, synchronizes distributed node caches and writes the file-system's log journal; or BeeGFS's "metadata node" server, with similar tasks and, e.g. stores metadata not in a custom database but in the extended file attributes of locally stored zero-size Ext4 shadow copies of files on the file-system.

MFM Drive Interface

Being a modification of the original frequency modulation encoding ("FM") code used in early magnetic storage, Modified frequency modulation (MFM) is - as the name implies - an improved variant of FM, a run-length limited (RLL) line code usually achieving double the information density of FM as it put clock and data into the same "clock window". MFM was used on some hard disk drives and in most floppy disk formats.

the Micropolis support knowledge database has a more detailed article about "MFM drives and common interfaces.

Microform

Microform describes the miniaturized photographic representation of an analog graphic, type-set or similarly visually representable source (medium) as an analog, usually black and white, image on a film-based medium. The miniaturization factor of microform images is usually 4% of the original size. The film used is high resolution Polyethylene terephthalate film (earlier forms also used Cellulose acetate and also paper/cardboard, "Microcards") in either 35mm spools of roll film (called "Microfilm") or small A6 sized (105 x 148mm, 4.1 x 5.8in) flat sheets of film (called "Microfiche", from French "la fiche", card or sheet). "Aperture cards" (sometimes "Filmcards") were a less common format where a cardboard card (either a computer-readable "punched card" or human-readable card with text on it) is combined with a 35mm microfilm frame, where the film-frame is glued into a cutout (aperture) on the card. While older microform reading equipment used backprojection on a tube-like diffusion screen (ground glass) to enlarge microform content, today's microform readers usually are based on digital imaging sensors and output to a connected display or computer systems. While microform is usually used to preserve analog source media, a process called "Computer Output on Microfilm" ("COM", sometimes "COM fiche") is used to print (photographic printing) digital information to a film medium. Microfilm or microfiche preservations of newspaper pages are very common, in libraries and archives. The microform representation on plastic film is one of the most reliable techniques available to mankind for long-time preservation of archived material. Photographic film, a WORM media, is able to remain stable over decades and centuries. In government-level archiving and preservation, microform content on film is sealed in dedicated canisters and stored in bunkers or high-security facilities to conserve and safeguard the cultural heritage of a whole society.

MinIO

MinIO is a high-performance open-sourced Object Storage solution offered by US company MinIO, Inc. since 2014. It offers an Amazon S3 compatible interface and current versions are Kubernetes-native and offer features specifically designed for AI use-cases.

Mirror

meaning a "Disk mirror", means creating or mainting a full replication of the content of an entire storage volume /storage disk.

mSATA (Mini-SATA)

is a SATA micro connector introduced in 2009 and predecessor of the M.2 format connector.

MTBF

short for "mean time between failures". It's a measure that describes the average duration of normal operation of equipment in-between two failures.

MTTF

short for "mean time to failure". MTTF is a statistical measure that describes the average duration of normal operation of equipment until failure; IEC 60050 (191)'s exact wording is: "the expectation of the time to failure". It is no guaranteed lifetime assessment and more of a guiding value gathered from observation and testing. There's also a related value, MTTF_D (MTTFD) to describe the "mean time to dangerous failure".

NAND flash memory

A type of flash memory that uses floating-gate transistors, where the transistor layout is modeled to form NAND (NOT-AND) logic gates. NAND flash memory is the most common technology used in USB sticks and USB storage devices ("flash drives"), memory cards like SD Cards, CF Cards, and SSDs.

NAS

Network attached storage. NAS organizes and shares files on or for a network of attached computers and users. File sharing on the local network is done through a standard Ethernet connection, such as but not limited to SMB/CIFS, NFS or AFP. In contrast to block-based SAN, NAS usually employ file-based protocols to serve client data. Examples of such protocols are NFS (Unix), SMB/CIFS (mostly Windows), AFP (Apple), NCP (Novell NetWare / Open Enterprise Server, OES). Modern NAS systems usually offer a user-friendly administration and file-access interface, allowing domestic or SoHo users to effectively use their appliance. File access can be both, native as locally attached storage or via common browser-based web semantics, even allowing users to make their local NAS reachable over the Internet, either via VPN tunneling or password protected WWW access. NAS systems may come from a trusted vendor or be user assembled in the form of a "DIY NAS".

During the 1990s, Micropolis offered an early version of a NAS storage appliance. The first being the 1030 MicroDisk™ external SCSI storage enclosure and later Micropolis expanded the product line into the Raidion™ stackable external SCSI enclosures in 5.25" and 3.5" form-factors. Raidion™ towers could be operated as JBODs, configured as RAID systems via Micropolis' own RAIDWARE software and later models (LTX) added a built-in hardware RAID controller.

Nearline storage

Nearline storage, being a portmanteau of "near" and "online storage" is a term used in computer science to describe a type of storage that represents an arbitrage between frequently accessed data and offline data, as used for longer term storage (archival or backup storage), which is usually much more infrequently accessed and cheaper to store. Tape libraries are one type of nearline storage. MAID assemblies are another, a HDD-based approach. The major difference between Nearline and Offline Storage is usually that Nearline data can be made available via automation, while Offline Data ("Cold Storage") may require human intervention to get back up online.

Network Switch

A network switch (aka "switching hub", "bridging hub", "MAC bridge" or simply a "switch")) is a multi-port networking device operating at the data link layer of the open systems interconnection (OSI) model. The main purpose of switches is to connect network nodes within a local area network (LAN). This is achieved by forwarding data frames that contain unique Media Access Control (MAC) addresses of network nodes. Switches come in various types, including modular (offering expandability), fixed-configuration (non-expandable), managed (providing the highest level of scalability, management, segmentation, quality of service, and security features), unmanaged (plug-and-play with basic connectivity), and smart (with limited management, segmentation, quality of service, and security features).

NFS

NFS (abbr. of Network File System) is a distributed file system protocol originally developed at SUN that provides access to files and directories located on remote computers as if they were local. The NFS shared storage protocol defines a mechanism for storing and retrieving files from storage devices across networks. The key advantages of employing NFS include centralized data storage, enhanced efficiency, data security, and scalability. It is important to note that NFS is not suitable for sharing sensitive data over public networks and lacks support for hierarchical storage management. NFS is one of several standards for distributed file systems in network-attached storage (NAS). NFS is usually used in UNIX-like environments, with SMB being the Windows equivalent.

NLE (non-linear editing system)

NLE, short for "Non-linear editing", is a digital video and audio editing process performed by a non-linear editing system. This process provides random access and non-destructive editing of A/V material, enabling editors to manipulate footage in or out of sequence down to a frame by frame level. "Non-destructive" here means that original content remains unaltered. Early NLE systems used proxy files of origin video on film, transferred to video tape or digital storage, exporting an Edit Decision List (EDL). Later NLEs work directly from a video file and usually export video in many codecs and container formats. In order to efficiently manage video files, NLE typically depend on performant storage backends, as A/V data tends to max out Storage capabilities.

NTFS

abbreviation of "New Technology File System" (sometimes "Microsoft Windows NT File System"), a proprietary file system developed by Microsoft. It shipped with Windows-NT in 1993 and has since replaced the older FAT file system used on Windows 9x operating systems.

NVM Express (NVMe)

short for "Non-Volatile Memory Host Controller Interface Specification" (NVMHCIS) is an open, logical-device interface specification. It controls how a computer accesses non-volatile storage media attached via the PCI Express bus. NVMe is able to realize faster I/O with solid-state storage devices in comparison with other interface standards.

NVM Express over Fabrics (NVMe-oF)

NVM Express over Fabrics ("NVMe-oF" or "NVMEoF") is an umbrella term for NVMe devices which are not directly connected to a host system (as is usually done via direct plug into PCIe connectors or via a PCIe switch) but via remote cabling means. NVMe-oF specification define a common framework for NVMe to work over various fabric types, including Ethernet-, InfiniBand-, and Fibre Channel-based networks. One implementation is "NVMe over Fibre Channel", currently being standardized (as of 2024) and sometimes abbreviated as "FC-NVMe" or "NVMe/FC". On the wire, some implementations use "RDMA over Converged Ethernet" (RoCE) then labeled "NVMe-oF RoCE", while others use the "Transmission Control Protocol" (TCP). "NVMe-oF" is similar to "iSCSI" in that both protocols expose block storage over a network connection. However, iSCSI is an older technology, and NVMe-oF typically outperforms it in most i/o metrics.

Object Storage

aka "object-based storage" or "blob storage" is a computer data storage approach where data is managed and regarded as being data "objects" or "blobs". This view is different from other storage architectures, as traditionally file systems manage data in nested directories forming a "file hierarchy". Object storage is also different from "block storage" which manages and describes data as blocks within sectors and tracks - usually laid out on circular (rotating) media - in turn usually meaning block devices. Amazon pioneered the Object Storage realm with its early Amazon AWS offering "Amazon S3" (for "Simple Storage Service"). To this day, many cloud providers rely on this de-facto interface standard and offer S3-compatible data stores, enabling multi-cloud deployments, seamless data migration or on-premise drop-in replacements. There are a number of closed and open-source offerings for S3-compatible storage platforms, like Ceph's S3 API layer, OpenStack Swift, Swiftstack (today nVidia), MinIO and AWS-similar cloud offerings "Storage-as-a-Service" (STaaS): Google Cloud Storage, IBM Cloud Object Storage, Microsoft Azure Blob Storage, Wasabi, etc. Many storage backends offer an emulation layer or compatibility API, like Ceph, IBM Spectrum Scale's Swift3 Middleware for OpenStack Swift and similar more. Extensions of the original S3 API today include features to implement WORM semantics to mitigate ransomware attacks or comply with archiving legislation.

OpenBMC

Linux Foundation originated open-source implementation of a "baseboard management controller" (BMC) firmware stack.

Open Compute Project (OCP)

A collaborative industry workgroup initiated by Facebook to redesign major components of datacenter infrastructure, with the aim to increase efficiency, reduce costs and improve overall management of basic datacenter building blocks. Some central topics are cooling and power distribution, physical arrangement of systems in redesigned server racks and reusable server blade enclosure designs.

OpenZFS

Open sourced variant of the ZFS file-system.

Oracle ZFS

Oracle's proprietary implementation of the ZFS file-system, originally developed at SUN Microsystems.

Over provisioning

as a general term "over provisioning" simply means to provide ample resources, more than needed, for example in network bandwidth, a broadband carrier may over-provision the allotted bandwidth for a customer, offering bandwidth suitable for peak times, while average network traffic used is much lower. In storage, "over-provisioning" usually means that additional (spare) drives are installed in a storage array, or a storage device vendor may advertise a certain capacity for its products, while the real internal capacity is far higher, allowing the device to use "self-healing" in case a number of blocks fail. In compute, overprovisioning usually means that DevOps or the more general IT architecture allocates more resources than needed under normal workloads. Usually, overprovisioning is an unwanted effect as it lowers optimization and raises total cost of ownership (TCO).

Parallel Filesystem

a parallel file system is a type of file system designed to provide high-performance access to data stored across multiple storage devices (data striping). It allows multiple processes to read from and write to a file simultaneously, which is particularly beneficial in high-performance computing (HPC) or for applications that require access to large amounts of data. Parallel filesystems usually offer scalability and high troughput. Their clustered design offers optional redundancy and load balancing between nodes. Common examples of parallel filesystems are Lustre, GPFS (IBM Spectrum Scale) and Fraunhofer's BeeGFS. Compare Clustered file-system.

Penta-Level Cell (PLC)

Type of multi-level memory cell (MLC) used in solid state flash memory, able to store 5 bits per cell.

Photochemical Film Preservation

describes broadly techniques to preserve film, still image or motion picture film, by reproducing the original black & white or color master positive or negative on other film stock for long term preservation and archiving. One modern process is the three-strip color-separation film-out process to preserve motion pictures.

Platter

Hard-drives are usually made up of multiple platters, rigid or fixed disk metal disks with a magnetizable surface. Each platter has two surfaces, the upper and the lower surface. Each surface is usually read by a read/write-head, thus a disk with for example 8 platters would have 16 read/write-heads.

PMem

short for "Persistent Memory", a hardware technology that marries the benefits of high-speed DRAM and non-volatile solid-state disks. It evolved from Battery-backed-up (BBU DIMMs). Persistent memory is usually directly attached to the memory bus of a system and thus is similar in addressing to RAM. Speed and latency there is also comparable. The difference is that in the event of power loss, PMEM behaves like non-volatile NAND memory and is able to keep data. Intel offers "Persistent Intel Optane memory" (3D XPoint DIMM) while other vendors offer similar memory under the broader term NVDIMM (non-volatile DIMM).

Pool

in data storage, a "storage pool" or "drive pool" describes a (physical) structure of a number of storage volumes organized ("grouped") together to form a "logical unit". It is common in RAID to speak of disk pools, for example to describe one RAID setup compromised of several disks. A "RAID 0 disk pool", for example, is a drive pool that uses striping across across a number of disks to improve I/O but does not provide any redundancy. With an increasing level of virtualization, the term "pool" is also used for other elements of enterprise IT, for example a "pool of machines", "pooled memory" (as in disaggregated memory layouts), "pooled GPU" or "GPU pools" for fibre attached GPU nodes, "block storage pools" (as in grouping of detached virtualized block volumes).

Power-Up In Standby (PUIS)

a feature defined as part of the SATA/PATA ATA Specifications Standards, defining that a drive's motor is not scheduled for spin-up once power is switched on for the device. It is a feature that gets stored in a drive's hardware controller. The feature is meant to mitigate a situation where a drive, intended to be only spun-up in case it is used but to remain powered down in nominal operation, would else spin-up when system power is switched-on, only to be immediately spun-down afterwards, resulting in unneeded wear on the hard disk drive's mechanics.

Project Sharing

"project" usually refers to video editing projects in storage infrastructure used in video storage and editing. One problem arising from sharing a project is locking of files for writing to data. Some filesystems like StorNext employ sophisticated file locking schemes to allow multi tenant access to single resources without data corruption.

Proxy File

A term usually found in video editing environments, a proxy file is a version of a file (reduced size, higher compression, different codec) that requires less taxing operations of the processing system to handle it. A proxy might be of lower bitrate to lessen IO overhead or in a different format which is more suitable to frame-accurate editing.

POSIX

POSIX, abbreviation of "Portable Operating System Interface", is a set of standards defined by the Institute of Electrical and Electronics Engineers (IEEE). It was developed to specify the application programming interface (API), shell, and commands - to ensure software compatibility with the family of Unix-like operating systems. It provides interoperability among disparate operating systems and facilitates software portability across these platforms. While the POSIX standard encompass more, most commonly, POSIX refers to file access, file metadata and storage semantics.

POSIX compliance

as part of the POSIX framework, various parts of a system or the whole system may be "POSIX compliant", meaning that the OS is aware of POSIX standards and APIs. For applications, this may mean that an application written for a POSIX system will run. In a data storage context, "POSIX compliance" usually means that data files exhibit certain traits, and offer a set of common (UNIX) POSIX metadata attributes to be read or set, like file creation, file modification timestamps, the ability to create specific file types, like FIFOs or links, etc. (create, read, write, modify, delete; organize and find files in a "traditionally" laid out hierarchy, abiding to strong / atomic guarantees).

What are Professional Services?

Data center solutions vendors usually work closely with large customers to install rack or data center scale systems. Such projects are overseen from planning to after sales support. "Professional Services" are customization tasks where software is adapted for a specific customer use case (bespoke systems). As such, these services complement the actual physical delivery and installation of large systems. Usually, professional services are structured in 5 steps: in Design and Modeling a project is defined, in Deployment the hard- and software systems are installed and tested, finalizing step Operation is the managed operation of the provided infrastructure and last step, Transfer, may mean the handing-over of an installed system into the hands of the customer. Professional Services may be rendered for large scale storage systems, compute clusters or HPC, data center deployments and a mix thereof.

Quotas

is a facility of per-user resource management. In data storage, "disk quotas" limit the allocation of disk space per user, per user-account. Many operating systems and file-systems offer facilities to grant disk quotas to users, manage their resources or withdraw quotas.

RAID

short for "redundant array of inexpensive disks" (sometimes "redundant array of independent disks") is a data redundancy technology where a number of physical (sometimes virtual) disk drives are combined into one logical volume, into a single logical unit. It involves storing the same data in different locations on multiple hard disks or solid-state drives to enhance performance, error correction, protection, and redundancy. The primary objectives of RAID include improving performance, increasing data reliability, and providing fault tolerance. There are several so called "RAID levels", meaning different configurations and schemes to combine multiple drives into one "RAID array", resulting in different levels of performance and/or redundancy. RAID is widely used in server environments and storage systems where performance, reliability, availability, and fault tolerance are critical. it is important to understand that RAID operates at the block device level, meaning RAID replicates devices not file-systems.

Micropolis used to offer RAID hardware solutions in the form of the "MicroDisk"-Series of external SCSI hard disk drive enclosures marketed under the Raidion™ brand. Micropolis RAID systems were available in simple JBOD configurations with a modular enclosure and the later Raidion LTX "Gandiva" variant, offering a built-in hardware RAID controller.

RAID Levels

RAID configurations are defined in a number of so called "RAID Levels", ranging from schemes without any data redundancy to levels with high data redundancy. Some RAID levels are officially defined, while others are vendor-specific implementations. There are also a number of RAID combinations, like "RAID 10".

RAID 0 - Striping (speed up without redundancy)
RAID 1 - Mirroring
RAID 2 - Bit-Level Striping with Hamming-Code-based error-correction
RAID 3 - Byte-Level Striping with Parity-Information on a separate disk
RAID 4 - Block-Level Striping with Parity-Information on a separate disk
RAID 5 - Block-Level Striping with Parity-Information across disks (performance + parity)
RAID 6 - Block-Level Striping with dual Parity-Information across disks (performance + double parity)

RAID 5 vs.RAID 6 vs. RAID 6+3

Features	RAID 5	RAID 6	Raid 6+3
Disks required	at least 3	at least 4	at least 4
Scheme	Block-level striping with distributed parity	Block-level striping with dual Parity-Information across disks	Block-level striping with triple Parity-Information across disks
Fault Tolerance	Failure of one disk	Failure of two disks	Failure of three disks
Capacity	Total number of drives minus one (4x 10TB = 30TB capacity, 10TB protection)	Total number of drives minus two (4x 10TB = 20TB capacity, 20TB protection)	Total number of drives minus three (4x 10TB = 10TB capacity, 30TB protection)
Note	Good performance, but once one drive fails and rebuild commences, performance drops significantly	Lower performance but higher fault tolerance	same performance as RAID 6 but higher data redundancy

Triple-Parity RAID ("RAID 6+3", "RAID TP")

a usually proprietary variant of RAID 6 with an added parity copy in comparison to normal RAID 6, implemented by various vendors. It allows at maximum of three drives to fail without any data loss.

RAIDZ

RAIDZ is a software RAID implementation used in ZFS, primarily on Linux and other Unix-based systems. It provides data redundancy and integrity while avoiding some of the pitfalls of traditional RAID setups. RAIDZ is similar to RAID 5, but it improves upon it, for example by eliminating the "write hole issue" (where data corruption can occur if a system crashes during a write operation) or by allowing variable-sized striping to reduce wasted space and to distribute data and parity more efficiently across disks.

Raidion

Micropolis Raidion™ is a legacy line of cross-platform modular external storage subsystems based on SCSI drives and RAID fault tolerance technology. The "MicroDisk LS" and "LT" disk arrays were stackable enclosures for standard 5.25 and 3.5 inch SCSI drives, with a "MicroDisk AV" variant for Micropolis' own "AV" drives, tuned for audio-visual workloads.

You may read the more detailed article about "Raidion here in our knowledge database.

RAIN

short for "Redundant Array of Inexpensive Nodes" (RAIN) aka "Redundant Array of Inexpensive Servers (RAIS)". RAIN is a concept of maintaining multiple redundant machines to allow individual servers/nodes to fail without interrupting the service provided. It is for example used in various distributed file systems. Inexpensive or stndardized nodes are the basis of the hyperconverged data center layouts.

RaptorQ code

a "near optimal fountain" (rateless erasure) code, similar to the group of Reed-Solomon codes (which represent "optimal erasure codes") or the class of Tornado codes ("near optimal erasure codes").

Raw-to-effective Storage Ratio

is a metric used to describe the relationship between the raw storage capacity of a storage system and the effective (meaning usable) storage capacity. While "raw" or "physical" capacity describes the total amount of physical storage space available in a system. A system may describe a single disk, drive or device - or a system of multiple drives and/or devices combined. The "effective" or "usable" storage capacity on the other hand is the amount of storage space that is actually available for storing data after accounting for factors like (disk/drive) formatting, file system overhead, data redundancy or RAID configurations. In (spinning) disk contexts it is (was) common to speak of "(raw or physical) capacity" vs. "formatted capacity" - while the latter my vary depending on actually used formatting scheme.

Read/Write-Head

A type of transducer used in magnetic storage devices, like hard drives, floppy drives (rotational media) or tape drives (linear media). The read/write-head converts electrical signals into magnetization impulses that are exerted on a magnetizable surface, thus inscribing (originally analog) today predominantly digital data streams into the recording media.

Rebuilding

In RAID systems, a drive failure and/or replacement requires the RAID set to be rebuilt. This process of rebuilding usually takes considerable time. Vendors compete on the level of how fast their RAID solutions are able to rebuild after a (multi) drive failure or on the question if their solution is able to be used normally during rebuilds. With ever growing data density per drive, the time required to rebuild a RAID traditionally increases. Vendors try to mitigate this effect by over-provisions storage with immediate spares, on the drive level or on the platter level. Modern hard disk drives (as of 2024) by some vendors are able to cease operation of single platters inside one drive, so that in case of failure not a whole drive has to be replaced but only a fraction of the total drive capacity is lost and requires compensation elsewhere. Another scheme is "Declustered RAID", which improves recovery performance by shuffling data and parity blocks among all disks (including spares). These competing scheme often lead to vendor specific Non-standard RAID levels.

Redfish (Redfish Scalable Platforms Management API)

is a HTTPS based remote server-management specification. Released in 2015 by the "Scalable Platforms Management Forum" (SPMF) and its umbrella organization "Distributed Management Task Force", it offers a REST interface to do common IPMI tasks, with the intention of replacing the defective "IPMI-over-LAN" protocol in the medium-term.

Redundancy

Redundancy in computer data storage means there is an additional copy of the actual data, allowing to recover from data loss or to detect errors in the main data store. The redundant data copy might be a real complete copy or be a derrivative, in the form of checksums (to detect a change in stored data) or select pieces which allow a rebuild of the actual data via elaborate reconstruction means. The concept of redundancy similarly applies to all building blocks of IT: power supplies, server systems, networking interfaces, data paths, power supply, data uplinks, security layers, etc.

Remote Direct Memory Access (RDMA)

commonly used in high-performance computing (HPC), cloud computing and large-scale data centers, RDMA is a technology that allows data to be transferred directly between the memory of two computers without involving the computer's operating system or CPU. This bypass leads to lower latency and higher throughput on a given system, while the CPU is freed up to perform other oprations. Currently mainly implemented by two companies, Mellanox (now Nvidia), using a flavor called RoCE based on UDP communication and InfiniBand, and the other flavor iWARP which is build around a TCP stack.

Replication

Replication in computing means the creation of redundant resources (hardware and software resources) to improve reliability, fault-tolerance and/or performance of a given system. Compare "Redundancy".

RoCE (RDMA over Converged Ethernet)

also known as "InfiniBand over Ethernet" (IBoE) or "NVMe-oF RoCE" is a network encapsulation protocol to enable remote direct memory access (RDMA) over an Ethernet network. There are multiple versions of RoCE (often pronounced "Rock-e") operating on different layers of the protocol stack and thus exhibit different routing capabilities. RoCEv2 builds on the original RoCE protocol (RoCEv1) and adds routability, UDP/IP encapsulation, a congestion control mechanism and multi-tenant support. In general, when UDP is used, vendors either improve reliability by using Converged Ethernet or by augmenting the RoCE protocol itself. As of 2024 there is no defined standard and vendors are still competing for the best or winning solution. Compare NVM Express over Fabrics (NVMe-oF).

Router

A router is a network device designed to connect two or more packet-switched networks by forwarding data packets between them. Operating at the network layer of the Open Systems Interconnection (OSI) model, routers must determine the most efficient path to forward data packets from a source to a destination node across different networks. This process is known as routing. Routers often provide additional functionality, such as automatic IP address issuance via "Dynamic Host Configuration Protocol" (DHCP), Network Address Translation (NAT), Virtual Local Area Network (VLAN), firewall or Access Control List (ACL).

Rotational media

as used in floppy disk drives and hard disk drives is probably one of the oldest media in technical human history. Data of any kind comes usually in a linear format, as time is well represented in an equally linear format. A string of events form a line of events, coming one after another. Some cultures use fabric threads as a linear medium, where knots and patterns of knots encode basic data, and these threads are read linearly when oral history is handed down as "written history", as a historical record. When Thomas Edison invented the phonograph, the idea was to spin a cylinder with a wax surface and a metallic stylus would then engrave the frequency response of a microphone-like assembly into the wax surface, forming a groove. This grove is essentially one long line of audio information, wrapped around such a cylinder. Shortly after, flattened recording surfaces replaced these wax cylinders, and the audio groove was laid out flat, on the record surface, in a spiral of near-concentric tracks. The linear idea was also embodied as tape media. On early tape, analog data was written on a vinyl medium coated with a magnetizable surface, similarly as a series of images is recorded on film for motion picture. The tape is then wound as spools and the recorded linear data, just as with flat disk records, forms a spiral around the spool's core.

With the advent of computer technology, the same ideas employed to record analog information were used to record digital data, on magnetizable rotating disks and on linear tape. Yet, with computers, the requirement to allow "random access" to data became much more important than with earlier recorded data. When playing an audio record on a turntable, the user can use the needle to place the stylus on arbitrary grooves, usually to find the beginning of a track. In comparison, the same chore with tape media is much more difficult. The tape has to be shuttled to find a specific point in time of a linear recording. The "seek times" of tape in comparison with disk media are much longer for random accesses. That's why computer floppy and hard disk drives ultimately won over tape media for random access. Here, the chore of placing the "stylus", a read/write-head, is servo-automated and reading/writing as well as positioning on a track can be switched in a split-second. For efficiency, the track layout of a spiral was replaced with a perfectly concentric track layout on the media surface, allowing the r/w-head easier positioning above tracks. With growing amounts of data to be stored, the next step was the layering of multiple disks over another, and r/w-heads would access all tracks on each disk simultaneously. Tracks were now read as concentric "cylinders". Today, in 2024, rotational media is still going strong, just as tape, and as an alternative option to Solid-state media, is usually selected where big amounts of data have to be stored reliably.

SAN

SAN, short for "Storage Area Network", is a dedicated high-speed network that interconnects shared pools of storage devices with each other and/or multiple servers. Its primary task is to share storage resources and provide access to multiple servers without relying on traditional TCP/IP based (slower) Local Area Network connections. Storage Area Networks are widely used in enterprise storage environments to meet storage requirements of large-scale applications, databases and virtualized environments. They enhance performance by segregating storage traffic from general network traffic, providing simplified data access and efficient management.

SaaS

an acronym used to describe cloud solutions that are offered via the web, usually accessed with a browser and do not require a local install of (binary) software, libraries or packages ("Software as a Service"). Many vendors of Software-as-a-Service offerings employ a Freemium or Subscription price model.

SAS

SAS, abbreviation of "Serial-Attached SCSI", is a high-performance, point-to-point serial protocol designed to connect and transfer data between the host system and mass storage devices. It is a faster and more scalable alternative to its predecessor "Parallel SCSI", providing higher reliability, higher improved scalability and compatibility in enterprise environments.

SATA

The term SATA (short for "Serial Advanced Technology Attachment") describes both, the 'physical interface' and the 'communication protocol' of SATA storage devices. Regarding the physical interface, SATA represents an industry-standard computer bus interface that connects hard disk drives, solid-state drives and optical drives to a computer's SATA host controller, either on-board or as a separate device. SATA is widely used for connecting internal storage devices in desktops, laptops and servers. On the communication protocol side, SATA represents a transport protocol that defines how data is transferred on the wire, between storage devices and a computer's SATA controller.

SATA Express (SATAe)

SATA Express, which is sometimes unofficially abbreviated to SATAe, not to be confused with eSATA) is a computer bus interface that supports both Serial ATA (SATA) and PCI Express (PCIe) storage devices.

Scale-Out

Scale out in computing (synonymous with "scaling horizontally") describes the scaling of a computer resource by adding (or removing) same category nodes to a system. For example, countering load spikes of a web-site by adding more complete web server nodes to a load-balancer.

Scale-Up

Scale up in computing (synonymous with "scaling vertically") describes scaling a computer resource by adding (or removing) resources to a single node of a system. For example, a database server might experience performance issues and is then scaled vertically by adding RAM, adding storage, upgrading storage, upgrading CPUs or increasing CPU frequency (overclocking), etc.

Scatter/Gather

is a scheme in data (memory) addressing that is used in vector computing (scatter/gather vector addressing) and in data storage I/O (vectored I/O). The basic principle is that connected or related data isn't necessarily stored nearby but scattered over the memory structure. In a first step, a read, data is gathered from these distributed memory locations, then, processing takes place and data needs to be written back, in the same scattered pattern it was read earlier. In storage, usually these disparate locations are different buffers and before a continuous chunk of data can be output, it has to be gathered from these buffers. On POSIX operating systems, there's a special system call to do this in one go, readv() and the reverse of it, writev(). A related concept to Scatter/Gather is a "Zero-Copy", of system call shortcut to an optimized underlying (memory) structure.

SCSI

Short for "Small Computer System Interface", pronounced "Skuz-ee". Successor to the earlier ESDI connection standard, the collection of SCSI standards define low-level protocol commands, communication protocols, electrical, optical and logical interfaces. Over the years, since the introduction of SCSI (Parallel SCSI) in the early 1980s (as X3.131-1986, or “SCSI-1), the SCSI standards have seen repeated updates and extensions well into the 2010s, e.g. in 1994 SCSI-2 (ANSI “X3.T9.2”), the definition of the Serial Attached SCSI (SAS) version four, SAS-4 in 2017 and SAS-5 for 45 Gbit/s data rates being underway (as of 2024).

Micropolis entered the hard disk drive business in the 1990s, emerging from manufacturing first Shugart interfaced Floppy Disk Drives and ESDI hard disk drives during the late 1980s. Prior to the move from the 5.25" form-factor for HDDs to 3.5" HDDs, Micropolis was manufacturer of the highest storage capacity SCSI hard disk drives on the market.

SDM

short for "Software-Defined Memory", a technical approach where traditionally physically installed RAM in a machine is abstracted away. By using a virtualisation layer between CPU and RAM, SDM separates (physical or virtual) storage resources and the CPU, allowing SysOps to dynamically allocate memory. Such an approach simplifies the maintenance and operation of data centers and allows for resource optimization by pooling memory across multiple servers, despite the lower performance compared to onboard RAM. Technically, an implementation usually taps into the PCIe lanes of a system and redirects memory accesses to another node via RDMA (Remote Direct Memory Access), which leverages byte-level DMA (Direct Memory Access). Memory nodes are usually connected by high-bandwidth data-center cabling such as fibre optics or InfiniBand. The basic principal is similar to Software-Defined Storage and Network Attached Storage.

Self-certifying file system

also abbreviated "SFS", is a file-system developed by David Mazières in 2000, aiming to provide a global and decentralized, distributed file system for Unix. It is based on NFS and requires a specific SFS client as file-manager.

Self-healing

a "marketing term" usually implemented by over-provisioning with error detection.

Semantic file system (SFS)

is a type of file-system that organizes file storage not in a traditional hierarchical representation to the end user but addresses data objects by their semantics (associations, signifiers, descriptors, meaning). In addition to a file's name, it is common that such file systems use user and/or machine generated tags attached to files. Due to the well-known nature of "tagging" in various web applications, semantic file systems tend to be regarded as being "tag based file systems". The semantic feature extension in SFSes can be implemented via an integrated approach, where the semantic capabilities are a native feature of a purposely written file-system, or an augmented approach, where the semantic feature is an add-on to an otherwise traditional file-system, providing traditional and semantic accessing means. As most semantic file systems more widely available currently implement their semantic extensions via an additional storage layer (augmented approach) - e.g. via separate databases, xattrib data storage schemes or special files - while relying on an established file-system for underlying storage, these systems usually suffer from an overhead for the upkeep of the semantic datastore.

Micropolis offers an experimental tag based file-system called "TagLayer", which can be mounted via FUSE.

Serial Storage Architecture (SSA)

is an interface standard based on SCSI-3 and promoted by IBM in the early 1990s. Invented by IBM's Ian Judd and then managed as an open standard by the SSA Industry Association with a number of industry members, among them Micropolis, the updated interface was meant to keep pace with the increasing capabilities of offered disk drives at the time. SSA goals were high performance at lowered cost, fault tolerance, easy integration, longer cables and more devices per bus. SCSI at the time had comparably bulky cables, needed manual drive addressing and allowed only 16 drives per bus (UltraSCSI). SSA changed that and introduced a fault-tolerant bus that allowed individual cables to break, as drives were in a loop with dual connections. Cables could be up to 20m long, drives had automatic ID addressing and were auto-terminated via on-drive terminators - all that while conforming to the full, set of SCSI commands, as it was effectively an extension of SCSI3. Ultimately, another interface and data transport protocol saw wider adoption and Fibre Channel replaced SSA in the market.

During testing and development of the SSA standard, Micropolis offered a version of the wide SCSI HDD model 3243 with a refreshed controller, allowing it to interface with SSA. These 7,200rpm 3.5" drives were performance tested as part of the Cornell University Project Zeno.

Shugart Interface

A disk storage interface designed by Shugart Associates, that became an industry standard in the 1980s.

SLOG

in data storage short for "Separate intent LOG", not to be confused with Sony's "S-Log" log curve to describe video camera data, "ZIL" is another (wrong) term for it.

Shingled Magnetic Recording (SMR)

SMR is a very dense data layout scheme used in rotating media with magnetizable surfaces. Individual data tracks are written "shingled", like roof shingles, with an overlap. It allows to write (and read) data in high throughput streams, yet deletion and rearranging of data is slower, as the drive's controller needs to undo shingled data in the process. SMR drives are different from "PMR" (Perpendicular Magnetic Recording) drives or "CMR" (Conventional Magnetic Recording) drives.

There's a more detailed article about "SMR Disk Drives here in our knowledge database.

SMB

short for "Server Message Block" (SMB or SMB/CIFS), a network protocol used by Microsoft products to share resources (files, printers, ...) on a network.

SMIS

short for "Storage Management Interface Specification", a standard by the Storage Networking Industry Association (SNIA), developed to bring more unity into the diverse fabric-based storage landscape by providing one secure and reliable interface for managing storage devices within a network. Based on the "Common Information Model" (CIM) and "Web-Based Enterprise Management" (WBEM) standards, SMIS aims to help organize and manage storage devices in an object-oriented manner, effectively simplifying common tasks in networked storage and data center operation.

Snapshot

or the practice (or ability of a system to do) "snapshotting", usually in relation to backups, means the ability to create a copy or clone of a data storage resource at one given point in time. The ability to take "snapshots" is usually valued in terms of how quick snapshots can be taken and if a resource can be rolled back to a certain snapshot.

Software-Defined Storage (SDS)

SDS separates (physical or virtual) storage resources and its accessing client layer or APIs by using a virtualisation layer. This allows storage resources to be "software defined", and enables cloud or data center operators (CSPs) to arrange (orchestrate) storage as abstract virtual resources. Cmp. "Storage Virtualization".

SPDK (Storage Performance Development Kit)

The "Storage Performance Development Kit" (SPDK) provides a set of tools and libraries for the development of high performance, scalable, user-mode storage applications. Compare DPDK.

Sputtering

In the context of data storage, sputtering (aka "cathode sputtering") is a physical vapor deposition technique used to deposit ultra thin (thin film layers) and very uniform coats of material on the surface of another material. It involves bombarding a target material (like metal) with high-energy particles so that atoms are ejected from it and then deposited evenly onto a target surface. This high-precision vacuum-based technique is used in many areas of the semiconductor industry. In relation to data storage, it is used to manufacture magnetizable surfaces on metal or plastic film in a very precise manner for maximum data density and reliability. A "sputtered tape" refers to magnetic tape where the magnetic layer (which holds the data) is applied using sputtering. Sputtered magnetic films are likewise used in the manufacture of hard disk drives, where rigid metal platters receive their magnetizable coating via this technique.

SSD

a Solid-state drive (SSD) is a storage device which uses integrated circuit assemblies (contrary to, for example magnetized regions, as in HDDs) to store data persistently, typically using NAND flash memory. (and "SLC" vs "MLC" SSDs)

SSD Hot-zone

A term usually found in hybrid storage where different storage tiers are used to optimize IO performance for frequently accessed data, "hot" data or "hot" files.

Staggered Spin-Up (SSU)

Staggered or "Staged Spin-Up" is a process to reduce power draw of larger storage arrays with multiple storage devices. Optical and hard disk drives use internal rotating media spun by motors, and electric motors may draw several times their normal full-load current when first energized (inrush current). In a storage subsystem with many hard-drives, this inrush current may max out or surpass the available power budget of the hosting enclosure, requiring a scheme to prevent such a condition during first boot up or in subsequent power-down/ spin-up cycles during normal operation. Thus, professional enclosure controllers sometimes offer options to control how individual drives or combined arrays of drives within the enclosure are treated for spin-up. Implementation of spin-up control varies, as some control protocols, like SAS, try to establish a standard to control drive spin-up, while others tackle the issue on the drive-level or via Host Bus Adapters. SATA and PATA standards tackle the process with offerings for Staggered spin-up (SSU) and Power-Up In Standby (PUIS) as part of the ATA Specifications Standards. Individual spin up times are measured as part of a drive's S.M.A.R.T. feature-set. It can be differentiated between spin-up on switch-on (with possibly largest current inrush) and spin-up from stand-by (potentially lesser).

StorNext File System (SNFS)

usually abbreviated as "StorNext", is a file system by Quantum Corp. Earlier, StorNext was called "ADIC" and "CentraVision File System (CVFS)". StorNext is popular in professional Video Editing and TV Production environments.

Striping

Striping, a term commonly found in relation with RAID sets, is a technique in data storage where data is broken up into chunks distributed over a number of storage volumes. Although the data then resides on multiple disks, the concept is usually that the array of disks is treated as if it would reside on one disk. "RAID 0" is data just being arranged in "stripes" across a number of disks, without any redundancy. Striping is useful to improve data throughput, as a single resource, broken into chunks on multiple volumes, can be read concurrently and thus delivered faster than a read from a single drive could achieve.

StripeGroup

also "Stripe Group", is a term used in Quantum's StorNext file-system. The design of StorNext allows (payload) data, metadata and journalling data to be allocated separately and as the file-system operates across an array of disks, StripeGroups form logical disk volumes (or LUNs, "Logical Unit Number"), and these StripeGroups, can then be used as volumes assigned to a specific role.

Storage Appliance

is a specialized kind of computer appliance, a computer system which - in combination of hardware, software and sometimes firmware - offers computer data storage to a host or server system as a coherent, probably simplified, product. While computer appliances in general had a popular phase during the 1990s, it has shown that despite some advantages, a custom appliance running custom software on custom hardware is prone to security vulnerabilities and renders the using party less flexible.

sync write

in POSIX systems there is an optional difference between a request to write data (to disk) and the actual write. A sync write (short for synchronous write request) is a command to actually immediately commit data to stable storage. As this blocks other pending IO actions, optimizing sync writes vs. cached IO is a non-trivial problem in data retrieval.

What does SDS mean in a storage context?

SDS is short for "Software-defined Storage" (SDS) and is a marketing term for computer data storage software for policy-based provisioning and management of data storage, independent of the underlying hardware. SDS is part of the ongoing abstraction in computer systems where virtualization separates solutions from actual hardware. One challenge in SDS is to keep performance at the same level of a dedicated hardware-based solution.

Sneakernet

a colloquial term to describe a transport mechanism that relies on simply carrying data from one point to another, possibly by foot, possibly wearing Sneakers (thus the name). In edge computing scenarios the term Sneakernet may be used to describe the process of transferring data onto a rugged storage device, which is then physically brought to a data center location for offloading. This method of ingestion skips slower upload interfaces, like consumer broadband connections, to transfer very large data sets into the cloud.

Software-defined Object Storage

part of the more general approach of Software-Defined Storage (SDS), SDOS abstracts (or virtualizes) object storage resources.

Storage Virtualization

is a broad term to describe initiatives to separate physical storage resources as virtual logical units, regardless of underlying technology. It offers greater flexibility in the management of storage resources. Storage Virtualization may involve Block virtualization (described before) or File virtualization where the same idea is applied to the file level. Storage Virtualization allows DevOps to improve actual physical storage resource utilization and offer features like non-disruptive data migration or resource allocation. Cmp. Software-defined Storage.

Tape Drive

a tape drive or "streamer" is a drive mechanism for linear magnetic media. Storing data on linear strips of magnetizable tape is one of the oldest forms of data storage and was roughly the next generation of data storage media after punchcards. In terms of longevity, magnetic tape is also one of the most robust forms of storing analog or digital data we know. In computer storage, large automated spools ("open reels") of magnetic tape where one of the first mass storage devices. Since then, tape media has become a go-to technique for archival and long term data storage. Moving away from open spools, tape data in recent years is mostly packaged in rigid cartridges or cassettes. Some standards are DAT (Digital Audio Tape), DLT (Digital Linear Tape), AIT (Advanced Intelligent Tape), VXA (Virtual eXtensible Architecture) and LTO (Linear Tape-Open).

Tape Library

Tape Libraries today are large cabinets or warehouse-like assemblies for automatic physical storage and retrieval of magnetic tape media (usually LTO tapes), scalable to hundreds of tape drives and thousands of tape storage slots. Tapes are identified by barcodes. Robotic arms shuttle tapes between passive resting slots and tape drives for active reading and writing, similar to automated warehouses. The loading and unloading of tapes is done automatically and on-demand. A tape library controller system coordinates the physical logistics of the managed tapes, while data clients connected to the library can access files without knowing about how data is actually read from or written to tapes. Tape libraries can be arranged in linear parallel shelf systems or in circular arrays ("tape silos"). Historically, tape libraries were not automated, but just rooms to store magnetic tape media, resembling a book library. Tape libraries are one form of "Nearline storage". One variant of a type library is an "optical jukebox", where similar automation is used, only with optical disk media as storage medium. Some solutions use disks instead of tapes but exhibit tape access semantics: Virtual Tape Library (VLT).

TCO (total cost of ownership)

In operating a data center, doing it cost-effective in every regard is an essential challenge. The total cost of ownership (TCO) in storage measures how costly the storage of data on a specific storage technology is over time. TCO covers more aspects than the more granular "Cost per byte". One interesting aspect here is the comparison of TCO of SSD vs. traditional hard disk drives. As the amount of data worldwide, in the zettabyte era, is exploding, we need tiered strategies to write and store the world’s (cloud) data. At the moment, the industry is just not able to produce enough solid-state memory to handle this amount of data, and cheap and reliable storage media like magnetic rotating disks or linear tape, are one solution.

Transactional file-system

Transactional file-system describes a file-system that relies on transactional concepts. ZFS is described as being a transactional file-system. In computer science, "transactional memory" is a concept to allow parallel processes ("concurrent computing") to access the same shared memory. It does so by controlling atomic actions from a high level, vs. a different approach of synchronizing threads on a low level. Transactional schemes may be implemented in hardware, software or in a combination of both.

U.2

Like M.2, U.2 (formerly known as SFF-8639) is part of the SATA family of computer interface and connection standards. U.2 connectors and interfaces are most commonly found on 2.5" SSDs. In comparison with M.2, U.2 can be optionally hot-swapped and provides support for 12V modes. And while M.2 supports a range of interface, U.2 is focused on NVMe. It's feature set is targeted at enterprise and data-center applications, resulting, for example, in a slightly larger form-factor for better thermal characteristics. With EDSFF (Enterprise and Data Center Standard Form Factor) a U.2 successor is currently being rolled out (as of 2025).

U.3

Also know by its small form factor designation, SFF-8639 or SFF-TA-1001, is an extension of U.2 and supports three modes of operation, SATA, SAS and NVMe.

Virtual file system (VFS)

Term describing an abstraction layer found in systems in relation to file systems. For example, the Linux VFS (aka Virtual Filesystem Switch) is the software layer Linux uses in the kernel to present interfaces of mounted file-systems interface to userspace programs, and this way, to the user. As virtual is a generic term for simulated or abstracted means, the FUSE (Filesystem in Userspace) software interface is often described as a Virtual File System, when it presents diverse things, from S3 storage buckets, to FTP servers or system resources, as a (virtual) local hierarchical file-system. The 'proc' file-system on Linux is another virtual or "pseudo" file-system, as it represents file resources as virtual files, e.g. the file '/proc/cpuinfo' contains CPU information.

VLT

short for "Virtual Tape Library", an archive or backup appliance that emulates a physical tape library. While some might argue that the very idea of a tape library is "using tape" instead of spinning media, VLTs offer a drop-in disk-based replacement for tape library based environments. The primary upside is regaining random read/write access to data with more dynamic access profiles. Many implementations add data integrity checks via erasure coding.

WebDAV

Short for "Web Distributed Authoring and Versioning" (defined in RFC 4918) is a set of extensions to HTTP, allowing browsers and other user-agents to handle web resources as user-modifiable content.

What are White Glove Services?

The term "white glove service" describes the special attentive care given to valuable goods when moving it from the vendor to the customer. It makes sure that equipment is transported with great care, delivered on time and in perfect working state and installed according to specifications. The term is used in general supply chain management and has importance in storage due to the fragile and expensive nature of large and heavy storage equipment. Vendors employ trained technicians to oversee the delivery of rack sized systems to guarantee impeccable working condition upon arrival on customers premises. Cmp. "Professional Services".

Winchester Drive

"Winchester drive" is the colloquial term for an encapsulated hard disk drive.

The interesting story of the origin of the term can be read in our KB article "What is a Winchester drive".

WORM

short for "write once read many" or "write once read multiple". WORM media is a type of media that can only be written once, no further changes possible, but then read many times. While the WORM principle has been implemented in a number of techniques, a good example of WORM media is photographic microfilm as used in Microform archival preservation. WORM media is tamper proof, immune to computer bugs or malicious attacks, can be part of a complete audit trail and provides authenticity.

xattr

Abbreviation of "Extended file attributes" common on Unix-like operating systems. See "Extended file attributes" for more.

XFS

is a high-performance 64-bit journaling file system initially developed by now defunct performance computer manufacturer Silicon Graphics. It supports Extents, variable block-sizes, striped allocation, journaling, delayed allocation and more. Despite being introduced in 1994, XFS is still actively developed and deployed.

Zero-Copy

One of many in-hardware storage optimisations. A zero-copy is a scheme where a lower-level (meaning below what the CPU usually does) system that can be instructed or is called to map data from one storage area (like a buffer) into another, in one go, with high efficiency (speed). Zero-Copy mechanisms are usually implemented in a special system-call. On Linux, for example the "sendfile" syscall replaces the usual for/while buffer-to-buffer reading/writing of data found in program code with one simple structure that does the same in an optimized fashion on another level of the storage layer. The concept of a "Zero-Copy" is related with (cmp.) "Scatter/Gather".

Zettabyte Era

a term applied to a period in modern history to describe the era of zettabyte data masses. Depending on which metric is used to define mankind's entry into the Zettabyte era (IP traffic or the sum of all data), the zettabyte era started around 2012 or 2016. The proliferation of video streaming (making up a large part of total global data traffic), mobile traffic, broadband internet and the Internet of Things (IoT) all contributed to mankind producing and/or moving large corporae of data. In conjunction with the term zettabyte, academia is discussing challenges and opportunities that arise.

SUN, with the abbreviation meaning "Zettabyte File System". Today, ZFS is available in forks of the original ZFS, as OpenZFS and Oracle ZFS.

Note on trademarks

Many of the designations used by manufacturers and sellers to distinguish their products or services are claimed as trademarks. Where those designations appear in this text and Micropolis and/or the authors were aware of a trademark claim, the designations are mentioned along with their owners and may be additionally marked with a trademark symbol. Their use here on this storage glossary page is for educational use of the reader and is covered under nominative fair use. Micropolis is in no way suggesting support, sponsorship or endorsement of the owner of these trademarks. Only as much of such marks is used as is necessary to identify the trademark owner, product, or service.