Units, lossy vs lossless, and the techniques that make Netflix, Spotify and WhatsApp possible
๐ฅ 01 ยท Did You Know?
Without compression, a single minute of HD video would consume over 1.5 gigabytes of storage. Your entire phone would hold about 30 seconds of a Netflix show. Spotify would need to stream at 27 MB per second just to deliver CD-quality audio โ over 200 times what it actually uses. The entire modern internet โ streaming, video calls, social media, cloud storage โ runs on one fundamental idea: you can represent the same information using fewer bits. Some methods discard data your brain won't notice anyway. Others find clever patterns to represent data more efficiently without losing a single bit. Understanding these two approaches โ lossy and lossless โ is one of the most practically important topics in the entire syllabus.
02 ยท Core Concept
Data Storage Units
Before compression, you need to be confident with storage units. Cambridge uses the binary (1024-based) system throughout:
Unit
Abbreviation
Size
Equivalent
Bit
b
1 binary digit (0 or 1)
Smallest unit
Nibble
โ
4 bits
ยฝ byte ยท one hex digit
Byte
B
8 bits
One character (ASCII)
Kilobyte
KB
1,024 bytes
Short text document
Megabyte
MB
1,024 KB = 1,048,576 B
One photo (compressed)
Gigabyte
GB
1,024 MB
~1 hour HD video
Terabyte
TB
1,024 GB
Large hard drive
Petabyte
PB
1,024 TB
Global data centre
โ ๏ธ Always use 1024 in Cambridge exams โ not 1000. 1 KB = 1024 bytes. Using 1000 will lose you marks.
Why Compression Is Needed
Uncompressed files are very large. Compression reduces file size to:
1
Use less storage space โ fit more files on a device or server.
2
Transmit faster โ smaller files download and stream more quickly over the internet.
3
Reduce bandwidth costs โ less data transferred = lower cost for streaming services.
Lossy vs Lossless Compression
๐๏ธ
Lossy Compression
Permanently removes data from the file. The original cannot be perfectly recovered. The removed data is selected because humans are unlikely to notice its absence.
โ
Much smaller file sizes (50โ95% reduction)
โ
Acceptable quality loss for most uses
โ
Cannot restore original file โ data is gone
โ
Not suitable for text, programs, or medical data
Common formats:
JPEG
MP3
AAC
MP4
H.264
OGG
โป๏ธ
Lossless Compression
Finds patterns to represent the same data in fewer bits. The original can be perfectly recovered. No data is permanently removed.
โ
Original file perfectly recoverable
โ
Safe for text, code, spreadsheets, executables
โ
Smaller reduction than lossy (20โ60% typically)
โ
Files still larger than lossy equivalents
Common formats:
PNG
FLAC
ZIP
RAR
GIF
BMP*
When to Use Each Type
Scenario
Best choice
Reason
Sharing a holiday photograph
Lossy (JPEG)
Small size needed, minor quality loss acceptable
Medical X-ray image
Lossless (PNG)
No detail can be lost โ diagnostic accuracy critical
RLE is a simple lossless compression technique. It works by replacing consecutive runs of identical values with a count and the value. It is particularly effective for images with large areas of uniform colour.
Example โ compressing a row of pixels with RLE:
R
R
R
R
W
W
W
W
W
W
B
B
B
G
G
G
G
G
Original (18 values stored individually): R R R R W W W W W W B B B G G G G G
RLE encoded (8 tokens): 4R ยท 6W ยท 3B ยท 5G
โ 18 values โ 8 tokens โ 56% compression
RLE works best when data has long runs of repeated values โ such as sky regions in photos, blank areas of documents, or areas of solid colour in graphics. It works poorly on photographs with lots of variation, which is why photos use JPEG (lossy) rather than RLE.
โ ๏ธ Common Exam Mistakes
โ
Saying lossy compression "reduces quality" without explaining how. You must explain that data is permanently removed and the original cannot be recovered โ not just that it "looks worse".
โ
Confusing the file format with the compression type. JPEG = lossy. PNG = lossless. Students sometimes call JPEG "lossless because it still looks good" โ wrong. It removes data regardless of how good it looks.
โ
Applying lossy compression to scenarios where data integrity is critical. Programs, spreadsheets, medical images, and legal documents must use lossless compression โ any data loss is catastrophic.
โ
Using 1000 instead of 1024 for unit conversions. 1 KB = 1024 bytes in Cambridge. 1 MB = 1024 KB. Using 1000 will give a different answer and lose marks.
๐
Cambridge Exam Tip: Compression questions are worth 2โ4 marks and always test: (1) define lossy/lossless, (2) give examples of each, (3) state when to use each, and (4) sometimes include RLE encoding. Learn the format pairs: JPEG=lossy, PNG=lossless, MP3=lossy, FLAC=lossless, ZIP=lossless. For "explain why" questions, you must always mention whether the original can be recovered โ that is the defining difference.
โก 03 ยท Interactive Tool
RLE Visualiser & Compression Calculator
// Type a sequence ยท Watch RLE compress it live ยท Then calculate compression ratios
Part 1 โ Run-Length Encoding in action
Load a preset or type your own:
AAAABBBBBCCDDDDD
Sky + ground
Low repetition
All same
RGB strips
Visual โ each cell is one character:
Runs identified:
Original
RLE Compressed
Part 2 โ Compare compressed file sizes
Quick examples:
JPEG photo 5.93 MB orig 85% lossy / 50% lossless
MP3 audio 30.28 MB orig 90% lossy / 55% lossless
PNG image 2 MB orig 0% lossy / 45% lossless
ZIP archive 1 GB orig 0% lossy / 40% lossless
Original File Size (MB)
Lossy Reduction (%)
0 = not applicable
Lossless Reduction (%)
0 = not applicable
โ Enter a valid original file size and at least one compression percentage.
04 ยท Worked Example
RLE and Compression Choices
Two classic Cambridge question types: RLE encoding and choosing the right compression method.
๐ Question: A row of pixels in an image is stored as: B B B B B B G G G G R R B B (14 pixels)
(a) Using run-length encoding, write the compressed representation of this pixel row. [2]
(b) A medical imaging system stores X-ray images. State which type of compression (lossy or lossless) should be used and justify your choice. [2]
1
Part (a) โ Identify runs
Group the consecutive identical pixels. What are the runs and their lengths?
โถ Click to reveal
B B B B B B โ 6 consecutive B pixels
G G G G โ 4 consecutive G pixels
R R โ 2 consecutive R pixels
B B โ 2 consecutive B pixels
Runs: 6B, 4G, 2R, 2B
2
Part (a) โ Write RLE output
Write the encoded sequence. What format does Cambridge expect?
โถ Click to reveal
RLE encoded: 6B 4G 2R 2B
Original: 14 values stored
Encoded: 8 tokens stored
Saving: 43% reduction
[2 marks: at least 3 runs correctly identified โ | all 4 runs correctly encoded โ]
Note: Some mark schemes accept "6B,4G,2R,2B" or "B6 G4 R2 B2" โ
the count-then-value or value-then-count format. Cambridge
accepts either as long as it is consistent.
3
Part (b) โ Choose compression type
Should medical X-rays use lossy or lossless compression? Justify with two points.
โถ Click to reveal
Lossless compression should be used. [1]
Justification:
โ Medical images must retain all detail โ a doctor may
need to examine fine features that could indicate disease [1]
โ Lossy compression permanently removes data; the original
image cannot be recovered, which could cause misdiagnosis
or missed findings in critical clinical situations
[2 marks: lossless stated โ | justified with reference to
needing all data / safety-critical / cannot afford data loss โ]
05 ยท Exam-Style Questions
Cambridge-Style Practice
For RLE questions, show each run clearly. For definition questions, always mention whether the original is recoverable.
Question 1
2 marks
Explain the difference between lossy and lossless compression.
โLossy compression permanently removes some data from the file โ the original file cannot be recovered / quality is reduced but file size is significantly smaller[1]
โLossless compression reduces file size without permanently removing any data โ the original file can be perfectly restored / no quality is lost[1]
Do not accept: "lossy loses quality" alone โ must state data is permanently lost / unrecoverable. Do not accept: "lossless keeps quality" alone โ must state original is recoverable.
Question 2
2 marks
A row of pixels is stored as: W W W W W R R R R G G W W W Use run-length encoding to write the compressed version of this row.
โRuns correctly identified: 5W, 4R, 2G, 3W[1]
โCorrect encoded output: 5W 4R 2G 3W (accept any consistent format: W5 R4 G2 W3 also acceptable)[1]
Award both marks if all four runs are correctly encoded. Award [1] if at least three runs are correct.
Question 3
2 marks
Give one example of a file format that uses lossy compression and one example that uses lossless compression. For each, state a typical use case.
โLossy: e.g. JPEG โ photographs / web images | MP3/AAC โ music files / streaming audio | MP4/H.264 โ video streaming[1]
โLossless: e.g. PNG โ logos / graphics / screenshots | FLAC โ high quality audio archiving | ZIP โ file archiving / document compression[1]
Credit any valid format and appropriate use case. Award 1 mark for each correct format-use pairing.
Question 4
2 marks
State two reasons why file compression is used when sending files over the internet.
โFiles are smaller so they can be transmitted/downloaded faster / take less time to upload or send[1]
โSmaller files use less bandwidth / reduce data transfer costs / more files can be sent in the same time / email size limits are not exceeded[1]
Credit: "uses less storage on the server" / "can store more files" / "enables streaming" as valid alternatives
Question 5
1 mark
State why run-length encoding would be ineffective at compressing a photographic image of a forest.
โA photograph of a forest has many different pixel colours with very few (if any) long runs of identical consecutive values โ RLE can only achieve compression when there are long repeated sequences, so there would be little or no reduction in file size[1]
Do not accept: "it has too many pixels" โ RLE works on any size image; the issue is the lack of repetition in the data
06 ยท Live Scored Quiz
5-Question Challenge
Lossy vs lossless, RLE, file formats, and units. Complete all 5 to earn your XP and save progress.