How computers capture, measure, and store the physical waves of sound as binary numbers
When you stream a song on Spotify, your ears are hearing something that was originally a continuous pressure wave in air β but was captured, measured 44,100 times per second, converted into 65,536 possible amplitude levels, and stored as binary. That process happened in a recording studio, was compressed, transmitted through fibre-optic cables, and reconstructed by your phone's speaker β all without losing more detail than human ears can detect. The standard we use today β 44.1 kHz, 16-bit audio β was chosen in the 1980s for CDs and has remained the standard ever since. Telephone calls use a much cruder 8 kHz, 8-bit system β which is why voices on calls sound different from in-person conversation. Every difference you can hear between a phone call and a concert recording comes down to two numbers: sample rate and bit depth.
Sound in the real world is analogue β a continuous wave of pressure variation in air. Computers can only store discrete binary numbers. To store sound digitally, we must sample the wave β take measurements of its amplitude at regular intervals and store each measurement as a binary number.
| Sample Rate | Quality | Typical Use | Effect |
|---|---|---|---|
| 8,000 Hz (8 kHz) | Low | Phone calls, voice memos | Captures speech frequencies, tinny sound |
| 22,050 Hz | Medium | Old radio quality, AM radio | Better than phone, missing high frequencies |
| 44,100 Hz (44.1 kHz) | High | CD audio β the standard | Covers full human hearing range (β€ 20 kHz) |
| 48,000 Hz (48 kHz) | High | Video / professional audio | Slight quality margin above CD |
| 96,000 Hz+ | Studio | Professional recording studios | Captures beyond human hearing; post-production flexibility |
| Bit Depth | Amplitude Levels (2βΏ) | Typical Use |
|---|---|---|
| 8-bit | 256 | Telephone, very low quality audio |
| 16-bit | 65,536 | CD audio β the standard for music |
| 24-bit | 16,777,216 | Professional studio recording |
| 32-bit | 4,294,967,296 | High-end audio production / post-processing |
Identify the four values: sample rate (Hz), bit depth (bits per sample), duration (seconds), channels (1 = mono, 2 = stereo).
Multiply all four together: SR Γ BD Γ Duration Γ Channels = file size in bits.
Divide by 8 for bytes, by 1024 for KB, by 1024 again for MB. Show every step β each conversion earns a mark.
The hardware component that performs sampling is the Analogue-to-Digital Converter (ADC). It is built into microphones, sound cards, and recording devices. It measures the sound wave amplitude at each sample interval and outputs a binary number. The reverse process β converting stored binary back to a speaker signal β is performed by a Digital-to-Analogue Converter (DAC).
Forgetting to multiply by the number of channels. A stereo recording has 2 channels β the file is exactly double the size of a mono recording at the same quality. If the question says "stereo", you must multiply by 2.
Confusing sample rate with bit depth. Sample rate = how often per second. Bit depth = how precisely each sample is measured. They are independent β a file can have a high sample rate with a low bit depth, or vice versa.
Forgetting to convert kHz to Hz. If given "44.1 kHz", use 44,100 in the formula β not 44.1. The formula requires Hz (samples per second), not kHz.
Saying "higher sample rate = better quality" without explaining why. You must explain that more samples per second means the digital waveform more accurately matches the original analogue wave β fewer details are lost between samples.
The exact format Cambridge uses for 3β4 mark calculation questions. Reveal each step only after attempting it.
Show every step of calculation. For explain questions, always give two distinct points for 2-mark answers.
Sample rate, bit depth, file size, ADC, and the stereo multiplier. Complete all 5 for your XP.