๐Ÿ“˜ Section B โ€” Text, Images, Sound & Data
๐Ÿ’พ Chapter 1 ยท Lesson 11 ยท Paper 1 & 2

Data Storage
& Compression

Units, lossy vs lossless, and the techniques that make Netflix, Spotify and WhatsApp possible

๐Ÿ”ฅ 01 ยท Did You Know?

Without compression, a single minute of HD video would consume over 1.5 gigabytes of storage. Your entire phone would hold about 30 seconds of a Netflix show. Spotify would need to stream at 27 MB per second just to deliver CD-quality audio โ€” over 200 times what it actually uses. The entire modern internet โ€” streaming, video calls, social media, cloud storage โ€” runs on one fundamental idea: you can represent the same information using fewer bits. Some methods discard data your brain won't notice anyway. Others find clever patterns to represent data more efficiently without losing a single bit. Understanding these two approaches โ€” lossy and lossless โ€” is one of the most practically important topics in the entire syllabus.

Data Storage Units

Before compression, you need to be confident with storage units. Cambridge uses the binary (1024-based) system throughout:

UnitAbbreviationSizeEquivalent
Bitb1 binary digit (0 or 1)Smallest unit
Nibbleโ€”4 bitsยฝ byte ยท one hex digit
ByteB8 bitsOne character (ASCII)
KilobyteKB1,024 bytesShort text document
MegabyteMB1,024 KB = 1,048,576 BOne photo (compressed)
GigabyteGB1,024 MB~1 hour HD video
TerabyteTB1,024 GBLarge hard drive
PetabytePB1,024 TBGlobal data centre
โš ๏ธ Always use 1024 in Cambridge exams โ€” not 1000. 1 KB = 1024 bytes. Using 1000 will lose you marks.

Why Compression Is Needed

Uncompressed files are very large. Compression reduces file size to:

1

Use less storage space โ€” fit more files on a device or server.

2

Transmit faster โ€” smaller files download and stream more quickly over the internet.

3

Reduce bandwidth costs โ€” less data transferred = lower cost for streaming services.

Lossy vs Lossless Compression

๐Ÿ—‘๏ธ
Lossy Compression
Permanently removes data from the file. The original cannot be perfectly recovered. The removed data is selected because humans are unlikely to notice its absence.
โœ…
Much smaller file sizes (50โ€“95% reduction)
โœ…
Acceptable quality loss for most uses
โŒ
Cannot restore original file โ€” data is gone
โŒ
Not suitable for text, programs, or medical data
Common formats:
JPEG
MP3
AAC
MP4
H.264
OGG
โ™ป๏ธ
Lossless Compression
Finds patterns to represent the same data in fewer bits. The original can be perfectly recovered. No data is permanently removed.
โœ…
Original file perfectly recoverable
โœ…
Safe for text, code, spreadsheets, executables
โŒ
Smaller reduction than lossy (20โ€“60% typically)
โŒ
Files still larger than lossy equivalents
Common formats:
PNG
FLAC
ZIP
RAR
GIF
BMP*

When to Use Each Type

ScenarioBest choiceReason
Sharing a holiday photographLossy (JPEG)Small size needed, minor quality loss acceptable
Medical X-ray imageLossless (PNG)No detail can be lost โ€” diagnostic accuracy critical
Streaming music on SpotifyLossy (AAC/MP3)Massive bandwidth savings, quality remains acceptable
Archiving source codeLossless (ZIP)A single changed character breaks the program
Sending an email attachmentLossless (ZIP)Documents must be reproduced perfectly
Streaming video on NetflixLossy (H.264)Would be impossible otherwise โ€” too much data

Run-Length Encoding (RLE)

RLE is a simple lossless compression technique. It works by replacing consecutive runs of identical values with a count and the value. It is particularly effective for images with large areas of uniform colour.

Example โ€” compressing a row of pixels with RLE:
R
R
R
R
W
W
W
W
W
W
B
B
B
G
G
G
G
G
Original (18 values stored individually): R R R R W W W W W W B B B G G G G G
RLE encoded (8 tokens):   4R ยท 6W ยท 3B ยท 5G
โœ… 18 values โ†’ 8 tokens โ€” 56% compression

RLE works best when data has long runs of repeated values โ€” such as sky regions in photos, blank areas of documents, or areas of solid colour in graphics. It works poorly on photographs with lots of variation, which is why photos use JPEG (lossy) rather than RLE.

โš ๏ธ Common Exam Mistakes

โŒ

Saying lossy compression "reduces quality" without explaining how. You must explain that data is permanently removed and the original cannot be recovered โ€” not just that it "looks worse".

โŒ

Confusing the file format with the compression type. JPEG = lossy. PNG = lossless. Students sometimes call JPEG "lossless because it still looks good" โ€” wrong. It removes data regardless of how good it looks.

โŒ

Applying lossy compression to scenarios where data integrity is critical. Programs, spreadsheets, medical images, and legal documents must use lossless compression โ€” any data loss is catastrophic.

โŒ

Using 1000 instead of 1024 for unit conversions. 1 KB = 1024 bytes in Cambridge. 1 MB = 1024 KB. Using 1000 will give a different answer and lose marks.

๐Ÿ†
Cambridge Exam Tip: Compression questions are worth 2โ€“4 marks and always test: (1) define lossy/lossless, (2) give examples of each, (3) state when to use each, and (4) sometimes include RLE encoding. Learn the format pairs: JPEG=lossy, PNG=lossless, MP3=lossy, FLAC=lossless, ZIP=lossless. For "explain why" questions, you must always mention whether the original can be recovered โ€” that is the defining difference.
RLE Visualiser & Compression Calculator
// Type a sequence ยท Watch RLE compress it live ยท Then calculate compression ratios
Part 1 โ€” Run-Length Encoding in action
Load a preset or type your own:
AAAABBBBBCCDDDDD
Sky + ground
Low repetition
All same
RGB strips
Visual โ€” each cell is one character:
Runs identified:
Original
RLE Compressed
Part 2 โ€” Compare compressed file sizes
Quick examples:
JPEG photo
5.93 MB orig
85% lossy / 50% lossless
MP3 audio
30.28 MB orig
90% lossy / 55% lossless
PNG image
2 MB orig
0% lossy / 45% lossless
ZIP archive
1 GB orig
0% lossy / 40% lossless
Original File Size (MB)
Lossy Reduction (%)
0 = not applicable
Lossless Reduction (%)
0 = not applicable
โš  Enter a valid original file size and at least one compression percentage.

RLE and Compression Choices

Two classic Cambridge question types: RLE encoding and choosing the right compression method.

๐Ÿ“‹ Question: A row of pixels in an image is stored as: B B B B B B G G G G R R B B (14 pixels)
(a) Using run-length encoding, write the compressed representation of this pixel row.  [2]
(b) A medical imaging system stores X-ray images. State which type of compression (lossy or lossless) should be used and justify your choice.  [2]
1
Part (a) โ€” Identify runs
Group the consecutive identical pixels. What are the runs and their lengths?
โ–ถ Click to reveal
B B B B B B โ†’ 6 consecutive B pixels G G G G โ†’ 4 consecutive G pixels R R โ†’ 2 consecutive R pixels B B โ†’ 2 consecutive B pixels Runs: 6B, 4G, 2R, 2B
2
Part (a) โ€” Write RLE output
Write the encoded sequence. What format does Cambridge expect?
โ–ถ Click to reveal
RLE encoded: 6B 4G 2R 2B Original: 14 values stored Encoded: 8 tokens stored Saving: 43% reduction [2 marks: at least 3 runs correctly identified โœ“ | all 4 runs correctly encoded โœ“] Note: Some mark schemes accept "6B,4G,2R,2B" or "B6 G4 R2 B2" โ€” the count-then-value or value-then-count format. Cambridge accepts either as long as it is consistent.
3
Part (b) โ€” Choose compression type
Should medical X-rays use lossy or lossless compression? Justify with two points.
โ–ถ Click to reveal
Lossless compression should be used. [1] Justification: โ€” Medical images must retain all detail โ€” a doctor may need to examine fine features that could indicate disease [1] โ€” Lossy compression permanently removes data; the original image cannot be recovered, which could cause misdiagnosis or missed findings in critical clinical situations [2 marks: lossless stated โœ“ | justified with reference to needing all data / safety-critical / cannot afford data loss โœ“]

Cambridge-Style Practice

For RLE questions, show each run clearly. For definition questions, always mention whether the original is recoverable.

Question 1
2 marks
Explain the difference between lossy and lossless compression.
โœ“Lossy compression permanently removes some data from the file โ€” the original file cannot be recovered / quality is reduced but file size is significantly smaller[1]
โœ“Lossless compression reduces file size without permanently removing any data โ€” the original file can be perfectly restored / no quality is lost[1]
Do not accept: "lossy loses quality" alone โ€” must state data is permanently lost / unrecoverable. Do not accept: "lossless keeps quality" alone โ€” must state original is recoverable.
Question 2
2 marks
A row of pixels is stored as: W W W W W R R R R G G W W W
Use run-length encoding to write the compressed version of this row.
โœ“Runs correctly identified: 5W, 4R, 2G, 3W[1]
โœ“Correct encoded output: 5W 4R 2G 3W (accept any consistent format: W5 R4 G2 W3 also acceptable)[1]
Award both marks if all four runs are correctly encoded. Award [1] if at least three runs are correct.
Question 3
2 marks
Give one example of a file format that uses lossy compression and one example that uses lossless compression. For each, state a typical use case.
โœ“Lossy: e.g. JPEG โ€” photographs / web images  |  MP3/AAC โ€” music files / streaming audio  |  MP4/H.264 โ€” video streaming[1]
โœ“Lossless: e.g. PNG โ€” logos / graphics / screenshots  |  FLAC โ€” high quality audio archiving  |  ZIP โ€” file archiving / document compression[1]
Credit any valid format and appropriate use case. Award 1 mark for each correct format-use pairing.
Question 4
2 marks
State two reasons why file compression is used when sending files over the internet.
โœ“Files are smaller so they can be transmitted/downloaded faster / take less time to upload or send[1]
โœ“Smaller files use less bandwidth / reduce data transfer costs / more files can be sent in the same time / email size limits are not exceeded[1]
Credit: "uses less storage on the server" / "can store more files" / "enables streaming" as valid alternatives
Question 5
1 mark
State why run-length encoding would be ineffective at compressing a photographic image of a forest.
โœ“A photograph of a forest has many different pixel colours with very few (if any) long runs of identical consecutive values โ€” RLE can only achieve compression when there are long repeated sequences, so there would be little or no reduction in file size[1]
Do not accept: "it has too many pixels" โ€” RLE works on any size image; the issue is the lack of repetition in the data

5-Question Challenge

Lossy vs lossless, RLE, file formats, and units. Complete all 5 to earn your XP and save progress.

โœ…
Score:
0 / 5
๐Ÿ’พ
Lesson 11 Complete โ€” Compression Expert!
+50 XP ยท Chapter 1 ยท Section B
๐Ÿ†
Lesson Complete! Score: ยท Saved โœ…
Next Lesson โ†’