Yum is a tool for automating package maintenance for a network of workstations running any operating system that use the Red Hat Package Management (RPM) system for distributing packaged tools and applications. It is derived from yup, an automated package updater originally developed for Yellowdog Linux, hence its name: yum is "Yellowdog Updater, Modified".
Yup was originally written and maintained by Dan Burcaw, Bryan Stillwell, Stephen Edie, and Troy Bengegerdes of Yellowdog Linux (an RPM-based Linux distribution that runs on Apple Macintoshes of various generation). Yum was written and is currently being maintained by Seth Vidal and Michael Stenner, both of Duke University, although as an open source GPL project many others have contributed code, ideas, and bug fixes (not to mention documentation:-). The yum link above acknowledges the (mostly) complete list of contributers, as does the AUTHORS file in distribution tarball.
Yum is a Gnu Public License (GPL) tool; it is freely available and can be used, modified, or redistributed without any fee or royalty provided that the terms of its associated license are followed.
Yum (currently) consists of two tools; yum-arch, which is used to construct an (ftp or http) repository on a suitable server, and yum, the general-purpose client. Once a yum repository is prepared (a simple process detailed below) any client permitted to access the repository can install, update, or remove one or more rpm-based packages from the repository.
Yum's "intelligence" in performing updates goes far beyond that of most related tools; yum has been used successfully on numerous occasions to perform a "running upgrade" of e.g. a Red Hat 7.1 system directly to 7.3 (where the probability of success naturally depends on how "customized" the target system is and how much critical configuration file formats have "drifted" between the initial and final revisions - YMMV).
In addition, the yum client encapsulates various informational tools, and can list rpm's both installed and available for installation, extract and publish information from the rpm headers based on keywords or globs, find packages that provide particular files. Yum is therefore of great use to users of a workstation, either private or on a LAN; with yum they can look over the list of available packages to see if there is anything "interesting", search for packages that contain a particular tool or apply to a particular task, and more.
Yum is designed to be a client-pull tool, permitting package management to be "centralized" to the extent required to ensure security and interoperability even across a broad, decentralized administrative domain. No root privileges are required on yum clients -- yum requires at most anonymous access (restricted or unrestricted) from the clients to a repository server (often one that is maintained by a central -- and competent -- authority). This makes yum an especially attractive tool for providing "centralized" scalable administration of linux systems in a decentralized network management environment, where a mix of machines maintained by their owners and by a variety of network managers naturally occurs (such as a University).
One of yum's most common uses in any LAN environment is to be run from a nightly cron script on each yum-maintained system to update every rpm package on the system safely to the latest versions available on the repository, including all security or operationally patched updates. If yum is itself installed from a rpm custom-preconfigured to perform this nightly update, an entire campus that installs its systems from a common repository base can achieve near complete consistency with respect to distribution, revision, and security. Security and other updates will typically appear on all net-connected clients no more than 24 hours after the an updated rpm is placed on the repository by its (trusted) administrator who requires no root-level privileges on any of the clients.
Consequently with yum a single trusted administrator can maintain a trusted rpm repository (set) for an entire University campus, an entire corporation, an entire government laboratory or institution. Alternatively, responsibility for different parts of a distribution can be split up safely between several trusted administrators on distinct repositories, or a local administrator can add a local trusted repository to overlay or augment the offerings of the campus level repositories. All systems at a common revision level will be consistent and interoperable to the extent that their installed packages (plus any overlays by local administrators) allow. Yum is hence an amazingly powerful tool for creating a customized repository-based package delivery and maintenance system that can scale the work of a single individual to cover thousands of machines.
And it's free. It just doesn't get any better than that....
To understand how yum works it helps to define a few terms:
An rpm consists of basically three parts: a header, a signature, and the (generally compressed) archive itself. The header contains a complete file list, a description of the package, a list of the features and libraries it provides, a list of tools it requires (from other packages) in order to function, what (known) other packages it conflicts with, and more. The basic rpm tool needs information in the header to permit a package to be installed (or uninstalled!) in such a way that:
This process is generically known as "resolving package dependencies" and is one of the most difficult parts of package management. It is quite possible to want to install a packaged tool that requires two or three libraries and a tool. The libraries in turn may require other libraries, the tool other tools. By the time you're done, installing the package may require that you install six or eight other packages, none of which are permitted to conflict or break any of the packages that are already there or will remain behind.
If you have ever attempted to manage rpm's by hand, you know that tracking down all of the headers and dependencies and resolving all conflicts is not easy and that it actually becomes more difficult in time as a system manager updates this on one system, that on another, rebuilds a package here, installs something locally into /usr/local there. Eventually (sometimes out of sheer frustration) an rpm is --force installed, and thereafter the rpm database itself on the system itself is basically inconsistent and any rpm install is likely to fail and require --force-ing in turn. Entropy creeps into the network, and with it security risks and dysfunction.
Yet not updating packages is also a losing situation. If you leave a distribution based install untouched it remains clean. However, parts of it were likely broken at the time of install -- there are always bugs even in the most careful of major distributions. Some of those bugs are security bugs, and as crackers discover them and exploits are developed it rapidly becomes a case of "patch your system or lay out the welcome mat for vermin". This is a global problem with all operating systems; even Windows-based systems (notorious for their vulnerability to viruses and crackers) can be made reasonably secure if they are rigorously kept up to date. Finally, users come along and demand THIS package or THAT package which are crucial to their work -- but not in the original, clean, consistent installation.
In balance, any professional LAN manager (or even humble standalone linux workstation owner) has little choice; they must have some sort of mechanism for updating the packages already installed on their system(s) to the latest, patched, secure, debugged versions and for adding more packages, including ones that may not have been in the distribution they relied upon for their original base install. The only questions are: what mechanism should they use and what will it cost them (in time, hassle, learning curve, and reliability as well as in money). Let us consider the problem:
In a typical repository, there are a lot of packages (order of 1000), with a lot of headers. About 700 packages are actually installed on the system I'm currently working on. However, the archive component of each package, which contains the actual binaries and libraries and documentation installed, is much larger -- the complete rpm is thus generally two to four orders of magnitude larger than the header. For example, the header for Open Office (a fairly large package) total about 100 kilobytes in size. The rpm itself, on the other hand, is about 30 megabytes in size. The header can be reliably delivered in a tiny fraction of a second over most networks; the rpm itself requires seconds to be delivered over 100BT, and minutes to be delivered over e.g. DSL, cable, or any relatively slow network. One occupies the physical server of a repository for a tiny interval; the other creates a meaningful, sustained load on the server. All of these are important considerations when designing or selecting an update mechanism intended to scale to perhaps thousands of clients and several distinct repositories per physical server.
Early automated update tools either required a locally mounted repository directory in order to be able to access all of the headers quickly (local disk access even from a relatively slow CD-ROM drive, being fast enough to deliver the rpm's in a timely way so that their headers could be extracted and parsed) or required that each linked rpm be sent in its entirety over a network to an updating client from the repository just so it could read the header. One was locally fast but required a large commitment of local disk resources (in addition to creating a new problem, that of keeping all the local copies of a master repository synchronized). The other was very slow. Both were also network resource intensive.
This is the fundamental problem that yum solves for you. Yum splits off the headers on the repository side (which is the job of its only repository-side tool, yum-arch). The headers themselves are thus available to be downloaded separately, and quickly, to the yum client, where they are typically cached semi-permanently in a small footprint in /var/cache/yum/serverid (recall that serverid is a label for a single repository that might be mirrored on several servers and available on a fallback basis from several URL's). Yum clients also cache (space permitting or according to the requirements and invocation schema selected by the system's administrator) rpm's when they are downloaded for an actual install or update, giving a yum client the best of both the options above -- a local disk image of (just the relevant part of) the repository that is automatically and transparently managed and rapid access to just the headers.
An actual download of all the headers associated with packages found on your system occurs the first time a yum client is invoked and thereafter it adds to or updates the cached headers (and downloads and caches the required rpm's) only if the repository has more recent versions or if the user has deliberately invoke yum's "clean" command to empty all its caches. All of yum's dependency resolution then proceeds from these cached header files, and if for any reason the install or update requires an rpm already in the cache to be reinstalled, it is immediately available.
As a parenthetical note, the author has used yum's caches in a trick to create a "virtual" update repository on his homogeneous, DSL-connected home LAN. By NFS exporting and mounting (rw,no_root_squash) /var/cache/yum to all the LAN clients, once normal updates have caused a header or rpm to be retrieved for any local host, they are available to all the local hosts over a (much faster than DSL) 100BT NFS mount. This saves tremendously on bandwidth and (campus) server load, using instead the undersubscribed server capacity of a tiny but powerful LAN. Best of all, there "is no setup"; what I just described is the works. A single export and a mount on all the clients and yum itself transparently does all of the work.
However, it is probably better in many cases to use rsync or other tools to provide a faithful mirror of the repository in question and use yum's fallback capability to accomplish the same thing (single use of a limited DSL channel) by design. This gives one a much better capability of standing alone should update access go away on the "server" of the yum cache NFS exported across a LAN.
With the header information (only) handy on high-speed local media, the standard tools used to maintain rpm's are invoked by yum and can quickly proceed to resolve all dependencies, determine if it is safe to proceed, what additional packages need to be installed, and so forth. Note well that yum is designed (by a highly experienced systems administrator, Seth Vidal, with the help of all the other highly experienced systems administrators on the yum list) to be safe. It will generally not proceed if it encounters a dependency loop, a package conflict, or a revision number conflict.
If yum finds that everything is good and the package can be safely installed, removed, or updated, it can either be invoked in such a way that it does so automatically with no further prompts so it can run automagically from cron, or (the general default when invoked from a command line) it can issue a user a single prompt indicating what it is about to do and requesting permission to proceed. If it finds that the requested action is in fact not safe, it will exit with as informative an error message as it can generate, permitting the system's administrator to attempt to resolve the situation by hand before proceeding (which may, for example, involve removing certain conflicting packages from the client system or fixing the repository itself).
From the overview given above, it should be apparent that yum is potentially a powerful tool indeed, using a single clever idea (the splitting off of the rpm headers) to achieve a singular degree of efficiency. One can immediately imagine all sorts of ways to exploit the information now so readily available to a client and wrap them all up in a single interface to eliminate the incredibly arcane and complex commands otherwise required to learn anything about the installed package base on a system and what is still available. The yum developers have been doing just that on the yum list - dreaming up features and literally overnight implementing the most attractive ones in new code. At this point yum is very nearly the last thing you'll ever need to manage packages on any rpm based system once it has gotten past its original, distribution vendor based, install.
Indeed, it is now so powerful that it risks losing some of its appealing simplicity. This HOWTO is intended to document yum's capabilities so even a novice can learn to use it client-side effectively in a very short time, and so that LAN administrators can have guidance in the necessarily more complex tasks associated with building and maintaining the repositories from which the yum clients retrieve headers and rpm's.
Yum's development is far from over. Volunteers are working on a GUI (to encapsulate many of yum's features for tty-averse users). Some of yum's functionality may be split off so that instead of a single client command there are two, or perhaps three (each with a simpler set of subcommand options and a clear differentiation of functionality). The idea of making yum's configuration file XML (to facilitate GUI maintenance and extensibility) is being kicked around. And of course, new features are constantly being requested and discussed and implemented or rejected. Individuals with dreams of their own (and some mad python or other programming skills:-) are invited to join the yum list and participate in the grand process of open source development.
Because yum invokes the same tools and python bindings used by e.g. Red Hat to actually resolve dependencies and perform installations (functioning as basically a supersmart shell for rpm and anaconda that can run directly from the local header cache) it has proven remarkably robust over several changes to the rpm toolset that have occurred since its inception, some of them fairly major. It is at least difficult for yum to "break" without Red Hat's own rpm installation toolset breaking as well, and after each recent major change yum has functioned again after a very brief period of tuneup.
It is important to emphasize, however, that yum is not a tool for administering Red Hat (only) repositories. Red Hat will be prominently mentioned in this HOWTO largely because we (Duke) currently use a Red Hat base for our campuswide linux distribution, maintain a primary (yum-enabled) Red Hat mirror, and are literally down the road a few miles from Red Hat itself. Still, if anything, yum is in (a friendly, open source) competition with Red Hat's own up2date mechanism and related mechanisms utilized by other distribution vendors.
So Note Well: Yum itself is designed for, and has been successfully used to support, rpm repositories of any operating system or distribution that relies on rpm's for package management and contains or can be augmented with the requisite rpm-python tools. Yum has been tested on or is in production on just about all the major rpm-based linuces, as well as at least one Solaris repository. Its direct conceptual predecessor (with which it shares many design features and ideas, although very little remaining actual code) is Yellowdog Linux's updater tool yup, which had nothing whatsoever to do with Red Hat per se. Yum truly is free like the air, and distribution-agnostic by deliberate design.
It is worth taking a short moment here and put in a bit of a plug for the distribution providers, Red Hat, Mandrake, SuSE, Yellowdog and all the rest. Yum is a tool that (clearly) can completely short circuit some, if not all, of their expected/hopeful income streams. Using the methods described below, one can literally scale any distribution over the entire Internet, working directly from a mirror (of a mirror of a mirror...) from their original web distribution, in such a way that all maintenance is fully automated and yet makes the distribution provider absolutely no money for doing the toplevel maintenance on the base distribution itself.
There are obvious ethical and practical issues here. Ethically one should exchange value to remain in karmic balance with the world ecology, including the amorphous and chaotic one that encompasses all of linux. There are two ways to do so in the open source world: share in the work of providing the services or pay some money (to help pay for the time and profit of those that do the work for you).
Practically one should be aware that if a distribution provider doesn't make enough money to pay for the actual work and capital investment required to assemble, test, debug, maintain, and distribute the software (plus a fair profit) they will either go out of business or alter their business model in ways that make it more difficult for us all to work efficiently and scalably from their distribution without paying them some money.
For both reasons I would urge that even though one can obtain linux using the methodologies described in this HOWTO, install it on ten thousand systems in a single organization, and maintain it completely automagically with yum for free, one seriously consider meeting one's ethical and practical responsibilities and ensuring that you pay the global mind back, either way.
I personally do both. I'm writing this HOWTO, and have written GPL packages in the past and made them publically available. I have participated in all sorts of linux development processes and am on a dozen high level linux lists. I still make sure that I buy something inexpensive from Red Hat (my current primary distribution) every two or three years so that they make a buck or two for every one of the systems I personally run in my house, per year.
It would be lovely if the primary distribution vendors made this easier. Obviously, I install via dulug and maintain via yum, so the RH 9 box set I purchased was a complete waste (and I'll cheerfully give it away if I can find anybody to give it to). I'd have rather found a paypal link on the RH website where one could utterly voluntarily kick in a payment and NOT get a damn thing back from them -- no updates, no free support services, nothing at all but continued free access to their distribution via the chain of mirrors I use to install and maintain it. Pure revenue for them, pure karmic peace for me.
A moment or two of meditation upon dependency resolution should suffice to convince one that Great Evil is possible in a large rpm repository. You have hundreds, perhaps thousands of rpm packages. Some are commercial, some are from some major distribution(s), others are local homebrew. What if, in all of these packages built at different times and by different people, you ever find that there exist rpm's such that (e.g.) rpm A requires rpm B, which conflicts with rpm C (already installed)? What if rpm A requires rpm B (revision 1.1.1) but rpm B (revision 1.2.1) is already installed and is required in that revision by rpm C (also already installed)? It is entirely possible to assemble an "rpm repository from hell" such that nearly any attempt to install a package will break something or require something that breaks something.
(As yet another parenthetical note, this was the thing that made many rpm-based distribution users look at Debian with a certain degree of longing. Apt untangles all of this for you and works entirely transparently from a single distribution "guaranteed to be consistent", and provides some lovely tools (some of which are functionally cloned in yum) for package management and dependency resolution. However, as is made clear on the yum site, yum is a better solution in many ways than apt or, for that matter, Current or up2date. I believe that the designers are working fairly aggressively to make sure it stays that way.)
A cynical (but correct) person would note that this was why rpmfind and other rpm "supertools" ultimately failed. Yes, rpmfind could locate any rpm on the planet in its superrepository a matter of a few seconds, BUT (big but) resolving dependencies was just about impossible. If one was lucky, installing an e.g. Mandrake rpm on a Red Hat system that used SuSE libraries rpm's would work. Sometimes one required luck to install the Red Hat rpm's it would find on a Red Hat system, as they were old or built with non-updated libraries. Sometimes things would "kind of work". Other times installing an rpm would break things like all hell, more or less irreversibly.
Untangling and avoiding this mess is what earns the major (rpm-based or not) linux distribution providers their money. They provide an entire set of rpm's (or other packages) "all at once" that are guaranteed to be consistent in the distribution snapshot on the CD's or ISO images or primary website. All rpm's required by any rpm in the set are in the set. No rpm's in the provided set conflict with other rpm's in the set. Consequently any rpm in the set can be selected to be installed on any system built from the distribution with the confidence that, once all the rpm dependencies are resolved, the rpm (along with its missing dependencies) can be successfully installed. The set provided is at least approximately complete, so that one supposedly has little incentive or need to install packages not already in the distribution (except where so doing requires the customer to "buy" a more expensive distribution from the vendor:-).
In the real world this ideal of consistency and completeness is basically never achieved. All the distributions I've ever tried or know about have bugs, often aren't totally consistent, and certainly are not complete. A "good" distribution can serve as a base for a repository and support e.g. network installs as well as disk or CD local installs, but one must be able to add, delete, update packages new and old to the repository and distribute them to all the systems that rely on the repository for update management both automatically and on demand.
Alas, rpm itself is a terrible tool to use for this purpose, a fact that has driven managers of rpm-based systems to regularly tear their hair for years now. Using rpm directly to manage rpm installs, the most one can do is look one step ahead to try to resolve dependencies. Since dependency loops are not at all uncommon on real-world repositories where things are added and taken away (and far from unknown even in box-set linux distributions that are supposed to be dependency-loop free) one can literally chase rpm's around in loops or up a tree trying to figure out what has to be installed before finally succeeding in installing the one lonely application you selected originally.
rpm doesn't permit one to tell it to "install package X and anything else that it needs, after YOU figure out what that might be". Yum, of course, does.
Even yum, though, can't "fix" a dependency loop, or cope with all the arcane revision numbering schemes or dependency specifications that appear in all the rpm's one might find and rebuild or develop locally for inclusion in a central repository. When one is encountered, a Real Human has to apply a considerable amount of systems expertise to resolve the problem. This suggests that building rpm's from sources in such a way that they "play nice" in a distribution repository, while a critical component of said repository, is not a trivial process. So much so that many rpm developers simply do not succeed.
Also, yum achieves its greatest degree of scalability and efficiency if only rpm-based installation is permitted on all the systems using yum to keep up to date. Installing locally built software into /usr/local becomes Evil and must be prohibited as impossible to keep up to date and maintained. Commercial packages have to have their cute (but often dumb) installation mechanisms circumvented and be repackaged into some sort of rpm for controlled distribution.
Consequently, repository maintainers must willy-nilly become rpm builders to at least some extent. If SuSE releases a lovely new tool in source rpm form that isn't in your current Red Hat based repository, of course you would like to rebuild it and add it. If your University has a site license for e.g. Mathematica and you would like to install it via the (properly secured and license controlling) repository you will need to turn it into an rpm. If nothing else, you'll need to repackage yum itself for client installations so that its configuration files point to your repositories and not the default repositories provided in the installation rpm's /etc/yum.conf.
For all of these reasons an entire section of this HOWTO is devoted to a guide for repository maintainers and rpm builders, including some practices which (if followed) would make dependency and revision numbering problems far less common and life consequently good.
In the next few sections we will see where to get yum, how to install it on the server side, and then how to set up and test a yum client. Following that there will be a few sections on advanced topics and design issues; how to set up a repository in a complex environment, how to build rpm's that are relatively unlikely to create dependency and revision problems in a joint repository, how to package third party (e.g. site licensed) software so it can be distributed, updated, and maintained via yum (linux software distributors take note!) and more.
Yum HOWTO Copyright (c) 2003 by Robert G. Brown
Please freely copy and distribute (sell or give away) this document in any format. It's requested that corrections and/or comments be forwarded to the document maintainer. You may create a derivative work and distribute it provided that you:
If you're considering making a derived work other than a translation, it's requested that you discuss your plans with the current maintainer.
Use the information in this document at your own risk. I disavow any potential liability for the contents of this document. Use of the concepts, examples, and/or other content of this document is entirely at your own risk.
All copyrights are owned by their owners, unless specifically noted otherwise. Use of a term in this document should not be regarded as affecting the validity of any trademark or service mark.
Naming of particular products or brands should not be seen as endorsements.
You are strongly recommended to take a backup of your system before major installation and backups at regular intervals.
This is the first release of this document, so there isn't much news.
Eventually we will hope that the very latest version number of this document can be obtained from a URL like http://www.linux.duke.edu/projects/yum/yum_HOWTO.html (which probably doesn't work yet).
Greg Wildman gregw at techno.co.za
Stein Gjoen sgjoen at nyx.net
Russ Herrold
Seth Vidal
Michael Stenner
(and others on the yum list).
In particular rgb gratefully acknowledges the help of Russ Herrold for writing the original, non-HOWTO online yum documentation from which this is loosely derived, as well Seth Vidal and Michael Stenner, who wrote and continue to maintain yum, including the base documentation from which this HOWTO is derived.
Any comments or suggestions can be mailed to the : yum mailing list. You might visit the yum mailing list website.