next up previous contents index
Next: Access Control Lists in Up: Appendixes Previous: Appendixes   Contents   Index

Subsections


File Systems in Linux

Linux supports a number of different file systems. This chapter presents a brief overview of the most popular Linux file systems, elaborating on their design concept, advantages, and fields of application. Some additional information about LFS (``Large File Support'') in Linux is also provided.


Glossary

metadata
A file system internal data structure that assures all the data on disk is properly organized and accessible. Essentially, it is ``data about the data.'' Almost every file system has its own structure of metadata, which is partly why the file systems show different performance characteristics. It is of major importance to maintain metadata intact, because otherwise the whole data on the file system could become inaccessible.

inode
Inodes contain various information about a file, including size, number of links, date, and time of creation, modification, and access, as well as pointers to the disk blocks where the file contents are actually stored.

journal
In the context of a file system, a journal is an on-disk structure containing a kind of log where the file system stores what it is about to change in the file system's metadata. ``Journaling'' greatly reduces the recovery time of a Linux system because it obsoletes the lengthy search process that checks the whole file system at system start-up. Instead, only the journal is replayed.


Major File Systems in Linux

Unlike two or three years ago, choosing a file system for a Linux system is no longer a matter of a few seconds (``Ext2 or ReiserFS?''). Kernels starting from 2.4 offer a variety of file systems from which to choose. The following is an overview of how those file systems basically work and which advantages they offer.

It is very important to bear in mind that there may be no file system that best suits all kinds of applications. Each file system has its particular strengths and weaknesses, which have to be taken into account. Even the most sophisticated file system cannot substitute for a reasonable backup strategy, however.

The terms ``data integrity'' or ``data consistency'', when used in this chapter, do not refer to the consistency of the user space data (the data your application writes to its files). Whether this data is consistent must be controlled by the application itself.


Note


[Setting up File Systems]Unless stated otherwise in this chapter, all the steps required to set up or to change partitions and file systems can be performed using the easy-to-use YaST module.


Ext2

The origins of Ext2 go back to the early days of Linux history. Its predecessor, the Extended File System, was implemented in April 1992 and integrated in Linux 0.96c. The Extended File System underwent a number of modifications and, as Ext2, became the most popular Linux file system for years. With the creation of journaling file systems and their astonishingly short recovery times, Ext2 became less important.

A brief summary of Ext2's strengths might help you to understand why it was -- and in some areas still is -- the favorite Linux file system of many a Linux user.

Solidity
Being quite an ``old-timer'', Ext2 underwent many improvements and was heavily tested. This may be the reason why people often refer to it as ``rock-solid''. After a system outage when the file system could not be cleanly unmounted, e2fsck starts to analyze the file system data. Metadata is brought into a consistent state and pending files or data blocks are written to a designated directory (called lost+found). In contrast to journaling file systems, e2fsck analyzes the entire file system and not just the recently modified bits of metadata. This takes significantly longer than checking the log data of a journaling file system. Depending on file system size, this procedure can take half an hour or more. Therefore, you would not choose Ext2 for any server that needs high availability. Yet, as Ext2 does not maintain a journal and uses significantly less memory, it is sometimes faster than other file systems.

Easy upgradability
The code for Ext2 is the strong foundation on which Ext3 could become a highly-acclaimed next-generation file system. Its reliability and solidity were elegantly combined with the advantages of a journaling file system.


Ext3

Ext3 was designed by Stephen Tweedie. In contrast to all other ``next-generation'' file systems, Ext3 does not follow a completely new design principle. It is based on Ext2. These two file systems are very closely related to each other. An Ext3 file system can be easily built on top of an Ext2 file system. The most important difference between Ext2 and Ext3 is that Ext3 supports journaling.

Summed up, Ext3 has three major advantages to offer:

Easy and highly reliable file system upgrades from Ext2
As Ext3 is based on the Ext2 code and shares its on-disk format as well as its metadata format, upgrades from Ext2 to Ext3 are incredibly easy. Unlike transitions to other journaling file systems, such as ReiserFS, JFS, or XFS, which can be quite tedious (making backups of the whole file system and recreating it from scratch), a transition to Ext3 is a matter of minutes. It is also very safe, as the recreation of an entire file system from scratch might not work flawlessly. Considering the number of existing Ext2 systems that await an upgrade to a journaling file system, you can easily figure out why Ext3 might be of some importance to many system administrators. Downgrading from Ext3 to Ext2 is as easy as the upgrade. Just perform a clean unmount of the Ext3 file system and remount it as an Ext2 file system.

Reliability and performance
Other journaling file systems follow the ``metadata-only'' journaling approach. This means your metadata will always be kept in a consistent state but the same cannot be automatically guaranteed for the file system data itself. Ext3 is designed to take care of both metadata and data. The degree of ``care'' can be customized. Enabling Ext3 in the data=journal mode offers maximum security (i.e., data integrity), but can slow down the system as both metadata and data are journaled. A relatively new approach is to use the data=ordered mode, which ensures both data and metadata integrity, but uses journaling only for metadata. The file system driver collects all data blocks that correspond to one metadata update. These blocks are grouped as a ``transaction'' and will be written to disk before the metadata is updated. As a result, consistency is achieved for metadata and data without sacrificing performance. A third option to use is data=writeback, which allows data to be written into the main file system after its metadata has been committed to the journal. This option is often considered the best in performance. It can, however, allow old data to reappear in files after crash and recovery while internal file system integrity is maintained. Unless you specify something else, Ext3 is run with the data=ordered default.


Tip


[Converting an Ext2 File System into an Ext3 File System]Converting from Ext2 to Ext3 involves two separate steps:

Creating the journal
Log in as SuSE @nohyphen root and execute the command tune2fs -j. This creates an Ext3 journal with the default parameters. If you want to decide yourself how large the journal should be and on which device it should reside, execute the command tune2fs -J instead, together with the desired journal options size= and device=. More information about the tune2fs program is available in its manual page (man 8 tune2fs).
Specifying the file system type in /etc/fstab
To ensure that the Ext3 file system is recognized as such, edit the file /etc/fstab, changing the file system type specified for the corresponding partition from ext2 to ext3. The change takes effect after the next reboot.


ReiserFS

Officially one of the key features of the 2.4 kernel release, ReiserFS has been available as a kernel patch for 2.2.x SuSE kernels since SuSE Linux version 6.4. ReiserFS was designed by Hans Reiser and the Namesys development team. ReiserFS has proven to be a powerful alternative to the old Ext2. Its key assets are better disk space utilization, better disk access performance, and faster crash recovery. However, there is a minor drawback: ReiserFS pays great care to metadata but not to the data itself. Future generations of ReiserFS will include data journaling (both metadata and actual data are written to the journal) as well as ordered writes.

ReiserFS's strengths, in more detail, are:

Better disk space utilization
In ReiserFS, all data is organized in a structure called B$^{*}$-balanced tree. The tree structure contributes to better disk space utilization as small files can be stored directly in the B$^{*}$tree leaf nodes instead of being stored elsewhere and just maintaining a pointer to the actual disk location. In addition to that, storage is not allocated in chunks of 1 or 4 kB, but in portions of the exact size needed. Another benefit lies in the dynamic allocation of inodes. This keeps the file system more flexible than traditional file systems, like Ext2, where the inode density has to be specified at file system creation time.

Better disk access performance
For small files, you will often find that both file data and ``stat_data'' (inode) information are stored next to each other. They can be read with a single disk IO operation, meaning that only one access to disk is required to retrieve all the information needed.

Fast crash recovery
Using a journal to keep track of recent metadata changes makes a file system check a matter of seconds, even for huge file systems.


JFS

JFS, the ``Journaling File System'' was developed by IBM. The first beta version of the JFS Linux port reached the Linux community in the summer of 2000. Version 1.0.0 was released in 2001. JFS is tailored to suit the needs of high throughput server environments where performance is the ultimate goal. Being a full 64-bit file system, JFS supports both large files and partitions, which is another reason for its use in server environments.

A closer look at JFS shows why this file system might prove a good choice for your Linux server:

Efficient journaling
JFS follows a ``metadata only'' approach like ReiserFS. Instead of an extensive check, only metadata changes generated by recent file system activity get checked, which saves a great amount of time in recovery. Concurrent operations requiring multiple concurrent log entries can be combined into one group commit, greatly reducing performance loss of the file system through multiple write operations.

Efficient directory organization
JFS holds two different directory organizations. For small directories, it allows the directory's content to be stored directly into its inode. For larger directories, it uses B$^{+}$trees, which greatly facilitate directory management.

Better space usage through dynamic inode allocation
For Ext2, you have to define the inode density in advance (the space occupied by management information), which restricted the maximum number of files or directories of your file system. JFS spares you these considerations -- it dynamically allocates inode space and frees it when it is no longer needed.


XFS

Originally intended as file system for their IRIX OS, SGI started XFS development back in the early 1990s. The idea behind XFS was to create a high-performance 64-bit journaling file system to meet the extreme computing challenges of today. XFS is very good at manipulating large files and performs well on high-end hardware. However, you will find a drawback even in XFS. Like ReiserFS, XFS takes a great deal of care of metadata integrity, but less of data integrity.

A quick review of XFS's key features explains why it may prove a strong competitor for other journaling file systems in high-end computing.

High scalability through the use of allocation groups
At creation time of an XFS file system, the block device underlying the file system is divided into eight or more linear regions of equal size. Those are referred to as ``allocation groups''. Each allocation group manages its own inodes and free disk space. Practically, allocation groups can be seen as ``file systems in a file system.'' As allocation groups are rather independent of each other, more than one of them can be addressed by the kernel simultaneously. This feature is the key to XFS's great scalability. Naturally, the concept of independent allocation groups suits the needs of multiprocessor systems.

High performance through efficient management of disk space
Free space and inodes are handled by B$^{+}$ trees inside the allocation groups. The use of B$^{+}$ trees greatly contributes to XFS's performance and scalability. A feature truly unique to XFS is ``delayed allocation''. XFS handles allocation by breaking the process into two pieces. A pending transaction is stored in RAM and the appropriate amount of space is reserved. XFS still does not decide where exactly (speaking of file system blocks) the data should be stored. This decision is delayed until the last possible moment. Some short-lived temporary data may never make its way to disk, because it may be obsolete at the time XFS decides where to actually save it. Thus XFS increases write performance and reduces file system fragmentation. Because delayed allocation results in less frequent write events than in other file systems, it is likely that data loss after a crash during a write is more severe.

Preallocation to avoid file system fragmentation
Before writing the data to the file system, XFS ``reserves'' (preallocates) the free space needed for a file. Thus, file system fragmentation is greatly reduced. Performance is increased as the contents of a file will not be distributed all over the file system.


Some Other Supported File Systems

Table A.1 summarizes some other file systems supported by Linux. They are supported mainly to ensure compatibility and interchange of data with different kinds of media or foreign operating systems.

File System Types in Linux
cramfs Compressed ROM file system: A compressed read-only file system for ROMs.
hpfs High Performance File System: the IBM OS/2 standard file system -- only supported in read-only mode.
iso9660 Standard file system on CD-ROMs.
minix This file system originated from academic projects on operating systems and was the first file system used in Linux. Nowadays, it is used as a file system for floppy disks.
msdos fat, the file system originally used by DOS, is today used by various operating systems.
ncpfs file system for mounting Novell volumes over networks.
nfs Network File System: Here, data can be stored on any machine in a network and access may be granted via a network.
smbfs Server Message Block: used by products such as Windows to enable file access over a network.
sysv Used on SCO UNIX, Xenix, and Coherent (commercial UNIX systems for PCs).
ufs Used by BSD, SunOS, and NeXTstep. Only supported in read-only mode.
umsdos UNIX on MSDOS: applied on top of a normal fat file system. Achieves UNIX functionality (permissions, links, long file names) by creating special files.
vfat Virtual FAT: extension of the fat file system (supports long file names).
ntfs Windows NT file system, read-only.


Large File Support in Linux

Originally, Linux supported a maximum file size of 2 GB. This was enough before the explosion of multimedia and as long as no one tried to manipulate huge databases on Linux. Becoming more and more important for server computing, the kernel and C library were modified to support file sizes larger than 2 GB when using a new set of interfaces that applications must utilize. Nowadays, (almost) all major file systems offer LFS support, allowing you to perform high-end computing.

Table A.2 offers an overview of the current limitations of Linux files and file systems for Kernel 2.4.


\begin{longtable}
% latex2html id marker 30919
%
{p{.28\linewidth}p{.31\linewid...
...$\ (8 EB) \\
& & \\
\caption{Maximum Sizes of File Systems}%\end{longtable}


Note


[Linux Kernel Limits]Table A.2 describes the limitations regarding the on-disk format. The following kernel limits (kernel version 2.4.x) exist:

  • 32-bit systems: The maximum size of any file or block device is limited to 2 TB. Using LVM to combine several block devices allows you to handle larger file systems.
  • 64-bit systems: File and file system sizes are limited to 2$^{63}$ (8 EB). This limit may not yet be reached due to a lack of hardware driver support.


For More Information

Each of the file system projects described above maintains its own home page where you can find mailing list information as well as further documentation and FAQs.

http://e2fsprogs.sourceforge.net/
http://www.zipworld.com.au/~akpm/linux/ext3/
http://www.namesys.com/
http://oss.software.ibm.com/developerworks/opensource/jfs/
http://oss.sgi.com/projects/xfs/

A comprehensive multipart tutorial on Linux file systems can be found at IBM developerWorks:
http://www-106.ibm.com/developerworks/library/l-fs.html

For a comparison of the different journaling file systems in Linux, look at Juan I. Santos Florido's article at Linuxgazette: http://www.linuxgazette.com/issue55/florido.html.

Those interested in an in-depth analysis of LFS in Linux should try Andreas Jaeger's LFS site: http://www.suse.de/~aj/linux_lfs.html.


next up previous contents index
Next: Access Control Lists in Up: Appendixes Previous: Appendixes   Contents   Index
root 2003-11-05