@section sec1-1 Section 1.1: Introduction
@subsection sec1-1-1 Section 1.1.1: The AFS 3.1 Distributed File System
- AFS 3.1 is a distributed file system (DFS) designed to meet the following set of requirements:
- Server-client model: Permanent file storage for AFS is maintained by a collection of file server machines. This centralized storage is accessed by individuals running on client machines, which also serve as the computational engines for those users. A single machine may act as both an AFS file server and client simultaneously. However, file server machines are generally assumed to be housed in a secure environment, behind locked doors.
- Scale: Unlike other existing DFSs, AFS was designed with the specific goal of supporting a very large user community. Unlike the rule-of-thumb ratio of 20 client machines for every server machine (20:1) used by Sun Microsystem's NFS distributed file system [4][5], the AFS architecture aims at smoothly supporting client/server ratios more along the lines of 200:1 within a single installation.
- AFS also provides another, higher-level notion of scalability. Not only can each independently-administered AFS site, or cell, grow very large (on the order of tens of thousands of client machines), but individual cells may easily collaborate to form a single, unified file space composed of the union of the individual name spaces. Thus, users have the image of a single unix file system tree rooted at the /afs directory on their machine. Access to files in this tree is performed with the standard unix commands, editors, and tools, regardless of a file's location.
- These cells and the files they export may be geographically dispersed, thus requiring client machines to access remote file servers across network pathways varying widely in speed, latency, and reliability. The AFS architecture encourages this concept of a single, wide-area file system. As of this writing, the community AFS filespace includes sites spanning the continental United States and Hawaii, and also reaches overseas to various installations in Europe, Japan, and Australia.
- Performance: This is a critical consideration given the scalability and connectivity requirements described above. A high-performance system in the face of high client/server ratios and the existence of low-bandwidth, high-latency network connections as well as the normal high-speed ones is achieved by two major mechanisms:
- Caching: Client machines make extensive use of caching techniques wherever possible. One important application of this methodology is that each client is required to maintain a cache of files it has accessed from AFS file servers, performing its operations exclusively on these local copies. This file cache is organized in a least-recently-used (LRU) fashion. Thus, each machine will build a local working set of objects being referenced by its users. As long as the cached images remain 'current' (i.e., compatible with the central version stored at the file servers), operations may be performed on these files without further communication with the central servers. This results in significant reductions in network traffic and server loads, paving the way for the target client/server ratios.
- This file cache is typically located on the client's local hard disk, although a strictly in-memory cache is also supported. The disk cache has the advantage that its contents will survive crashes and reboots, with the expectation that the majority of cached objects will remain current. The local cache parameters, including the maximum number of blocks it may occupy on the local disk, may be changed on the fly. In order to avoid having the size of the client file cache become a limit on the length of an AFS file, caching is actually performed on chunks of the file. These chunks are typically 64 Kbytes in length, although the chunk size used by the client is settable when the client starts up.
- Callbacks: The use of caches by the file system, as described above, raises the thorny issue of cache consistency. Each client must efficiently determine whether its cached file chunks are identical to the corresponding sections of the file as stored at the server machine before allowing a user to operate on those chunks. AFS employs the notion of a callback as the backbone of its cache consistency algorithm. When a server machine delivers one or more chunks of a file to a client, it also includes a callback 'promise' that the client will be notified if any modifications are made to the data in the file. Thus, as long as the client machine is in possession of a callback for a file, it knows it is correctly synchronized with the centrally-stored version, and allows its users to operate on it as desired without any further interaction with the server. Before a file server stores a more recent version of a file on its own disks, it will first break all outstanding callbacks on this item. A callback will eventually time out, even if there are no changes to the file or directory it covers.
- Location transparency: The typical AFS user does not know which server or servers houses any of his or her files. In fact, the user's storage may be distributed among several servers. This location transparency also allows user data to be migrated between servers without users having to take corrective actions, or even becoming aware of the shift.
- Reliability: The crash of a server machine in any distributed file system will cause the information it hosts to become unavailable to the user community. The same effect is caused when server and client machines are isolated across a network partition. AFS addresses this situation by allowing data to be replicated across two or more servers in a read-only fashion. If the client machine loses contact with a particular server from which it is attempting to fetch data, it hunts among the remaining machines hosting replicas, looking for one that is still in operation. This search is performed without the user's knowledge or intervention, smoothly masking outages whenever possible. Each client machine will automatically perform periodic probes of machines on its list of known servers, updating its internal records concerning their status. Consequently, server machines may enter and exit the pool without administrator intervention.
- Replication also applies to the various databases employed by the AFS server processes. These system databases are read/write replicated with a single synchronization site at any instant. If a synchronization site is lost due to failure, the remaining database sites elect a new synchronization site automatically without operator intervention.
- This document is a member of a documentation suite providing specifications of the operations and interfaces offered by the various AFS servers and agents. Specifically, this document will focus on two of these system agents:
- The full AFS specification suite of documents is listed below:
- AFS-3 Programmer's Reference: Architectural Overview: This paper provides an architectual overview of the AFS distributed file system, describing the full set of servers and agents in a coherent way, illustrating their relationships to each other and examining their interactions.
- AFS-3 Programmer's Reference:Volume Server/Volume Location Server Interface: This document describes the services through which 'containers' of related user data are located and managed.
- AFS-3 Programmer's Reference: Protection Server Interface: This paper describes the server responsible for providing two-way mappings between printable usernames and their internal AFS identifiers. The Protection Server also allows users to create, destroy, and manipulate 'groups' of users, which are suitable for placement on ACLs. AFS-3 Programmer's Reference: BOS Server Interface: This paper explicates the 'nanny' service described above, which assists in the administrability of the AFS environment.
- AFS-3 Programmer's Reference: Specification for the Rx Remote Procedure Call Facility: This document specifies the design and operation of the remote procedure call and lightweight process packages used by AFS.
- In addition to these papers, the AFS 3.1 product is delivered with its own user, administrator, installation, and command reference documents.
@section sec1-2 Section 1.2: Basic Concepts
- To properly understand AFS operation, specifically the tasks and objectives of the File Server and Cache Manager, it is necessary to introduce and explain the following concepts:
- Cell: A cell is the set of server and client machines operated by an administratively independent organization. The cell administrators make decisions concerning such issues as server deployment and configuration, user backup schedules, and replication strategies on their own hardware and disk storage completely independently from those implemented by other cell administrators regarding their own domains. Every client machine belongs to exactly one cell, and uses that information to determine the set of database servers it uses to locate system resources and generate authentication information.
- Volume: AFS disk partitions do not directly host individual user files or directories. Rather, connected subtrees of the system's directory structure are placed into containers called volumes. Volumes vary in size dynamically as objects are inserted, overwritten, and deleted. Each volume has an associated quota, or maximum permissible storage. A single unix disk partition may host one or more volumes, and in fact may host as many volumes as physically fit in the storage space. However, a practical maximum is 3,500 volumes per disk partition, since this is the highest number currently handled by the salvager program. The salvager is run on occasions where the volume structures on disk are inconsistent, repairing the damage. A compile-time constant within the salvager imposes the above limit, causing it to refuse to repair any inconsistent partition with more than 3,500 volumes. Volumes serve many purposes within AFS. First, they reduce the number of objects with which an administrator must be concerned, since operations are normally performed on an entire volume at once (and thus on all files and directories contained within the volume). In addition, volumes are the unit of replication, data mobility between servers, and backup. Disk utilization may be balanced by transparently moving volumes between partitions.
- Mount Point: The connected subtrees contained within individual volumes stored at AFS file server machines are 'glued' to their proper places in the file space defined by a site, forming a single, apparently seamless unix tree. These attachment points are referred to as mount points. Mount points are persistent objects, implemented as symbolic links whose contents obey a stylized format. Thus, AFS mount points differ from NFS-style mounts. In the NFS environment, the user dynamically mounts entire remote disk partitions using any desired name. These mounts do not survive client restarts, and do not insure a uniform namespace between different machines.
- As a Cache Manager resolves an AFS pathname as part of a file system operation initiated by a user process, it recognizes mount points and takes special action to resolve them. The Cache Manager consults the appropriate Volume Location Server to discover the File Server (or set of File Servers) hosting the indicated volume. This location information is cached, and the Cache Manager then proceeds to contact the listed File Server(s) in turn until one is found that responds with the contents of the volume's root directory. Once mapped to a real file system object, the pathname resolution proceeds to the next component.
- Database Server: A set of AFS databases is required for the proper functioning of the system. Each database may be replicated across two or more file server machines. Access to these databases is mediated by a database server process running at each replication site. One site is declared to be the synchronization site, the sole location accepting requests to modify the databases. All other sites are read-only with respect to the set of AFS users. When the synchronization site receives an update to its database, it immediately distributes it to the other sites. Should a synchronization site go down through either a hard failure or a network partition, the remaining sites will automatically elect a new synchronization site if they form a quorum, or majority. This insures that multiple synchronization sites do not become active in the network partition scenario.
- The classes of AFS database servers are listed below:
- Following this introduction and overview, Chapter 2 describes the architecture of the File Server process design. Similarly, Chapter 3 describes the architecture of the in-kernel Cache Manager agent. Following these architectural examinations, Chapter 4 provides a set of basic coding definitions common to both the AFS File Server and Cache Manager, required to properly understand the interface specifications which follow. Chapter 5 then proceeds to specify the various File Server interfaces. The myriad Cache Manager interfaces are presented in Chapter 6, thus completing the document.