📘 Section B — Text, Images, Sound & Data
📖 Chapter 1 · Lesson 8 · Paper 1 & 2

Text Representation

How every letter, digit, and emoji becomes a number — and then binary

🔥 01 · Did You Know?

Every text message you have ever sent was stored as a sequence of numbers. The word "Hello" — five characters — is stored as five numbers: 72, 101, 108, 108, 111. When engineers at CERN invented the World Wide Web in 1989, the first-ever web page was purely ASCII text — just 128 numbers, each mapped to a character. Today, a single WhatsApp message might span English letters, Arabic script, Chinese hanzi, and a 😊 emoji — all encoded in Unicode, which represents over 140,000 characters from every writing system on Earth. The story of text encoding is the story of computing going global. Before Unicode, Japanese computers used one encoding, Russian computers used another, Arabic computers a third — and sharing files between them was a nightmare. One standard fixed everything.

How Computers Store Text

Computers can only store binary numbers. To store text, every character — letters, digits, punctuation, spaces — is assigned a unique character code: a whole number. That number is then stored in binary. Two major standards define these mappings.

🔑 The key idea: a character encoding is a lookup table. Every character has a unique number. The computer stores the number; when displaying, it looks up the number to find the character to show on screen.

ASCII vs Unicode — Side by Side

Feature ASCII Unicode
Full nameAmerican Standard Code for Information InterchangeUniversal Character Set — covers all world scripts
Bits per character7-bit original (128 chars) · 8-bit extended (256)Variable: UTF-8 uses 8–32 bits · UTF-16 uses 16–32
Characters supported128 standard · 256 extendedOver 140,000 from 150+ writing systems
LanguagesEnglish and basic Western European onlyEvery human writing system including emoji
File size impactSmaller — fewer bits per characterLarger — more bits needed per character
Backward compatibilityFirst 128 Unicode values are identical to ASCII

ASCII — The Patterns You Must Know

You do not need to memorise the whole ASCII table. You need these four anchor points and the relationships between them:

A

Capital 'A' = 65. All uppercase letters are sequential: B=66, C=67 … Z=90. If you know one letter's code, count forward or backward to find another.

a

Lowercase 'a' = 97. All lowercase letters are sequential: b=98, c=99 … z=122. The rule: lowercase = uppercase + 32. This is the single most important pattern in ASCII.

0

The digit character '0' = 48. '1'=49, '2'=50 … '9'=57. Critical: the character '5' has code 53, not 5. The digit characters are not the same as the numbers they represent.

·

Space character = 32. This is no accident — 32 is the gap between uppercase and lowercase, and the space character sits exactly at that gap value.

The +32 Rule — Why It Works in Binary

In binary, the only bit-level difference between a capital letter and its lowercase equivalent is bit 5 (the 32-value column). Setting bit 5 to 1 converts uppercase → lowercase. Clearing it reverses the conversion.

A
65 in denary
01000001
Bit 6 set (64), bit 0 set (1). Bit 5 = 0
a
97 in denary = 65 + 32
01100001
Same as A, but bit 5 = 1 (adds 32)

Key ASCII Codes — Commit These to Memory

A
65
01000001
B
66
01000010
Z
90
01011010
a
97
01100001
b
98
01100010
z
122
01111010
0
48
00110000
9
57
00111001
SPC
32
00100000

Unicode — Why It Was Needed and How It Works

ASCII was designed in the 1960s for English text on American teletype machines. As computers spread globally, the need for Chinese, Arabic, Japanese, Hindi and hundreds of other scripts became critical. ASCII's 128–256 character limit was completely inadequate. Unicode solves this by allocating more bits per character, allowing a vastly larger range of code points.

1

UTF-8 — most common format on the web. Uses 8–32 bits per character. ASCII characters (0–127) still use just 8 bits, so English text files are the same size as ASCII. Non-English characters use 16–32 bits.

2

UTF-16 — uses 16 or 32 bits per character. More efficient for East Asian scripts where most characters need 2 bytes. Used internally by Java, JavaScript, Windows, and Swift.

3

More bits = larger files. A document in UTF-32 will be roughly four times the size of the equivalent UTF-8 document (for English text). This is the key trade-off: Unicode can represent more characters, but costs more storage.

⚠️ Common Exam Mistakes

Confusing the character '5' with the number 5. The character '5' has ASCII code 53 (because '0'=48 and 48+5=53). The number 5 stored as an integer would be 00000101. These are completely different binary patterns.

Saying "Unicode uses more memory" without explaining why. You must state it uses more bits per character to represent a larger set of characters. Vague answers like "Unicode is bigger" earn zero marks.

Getting the +32 rule direction wrong. Adding 32 converts UPPER → lower (A→a). Subtracting 32 converts lower → UPPER (a→A). Mixing these up in an exam is a very common error.

Stating ASCII uses 8 bits. Original (standard) ASCII is 7 bits — 128 characters. Extended ASCII is 8 bits — 256 characters. Cambridge papers usually mean 7-bit unless they say "extended ASCII".

🏆
Cambridge Exam Tip: ASCII and Unicode questions appear in nearly every Paper 1. The two guaranteed question types are: (1) a code calculation — find the ASCII code or binary for a letter given another letter's code as a starting point, and (2) state an advantage of Unicode over ASCII. Memorise A=65, a=97, '0'=48, Space=32 and the +32 rule. These five facts can earn you marks in under 30 seconds per question.
Text Encoder — ASCII & Unicode
// Type any text · See each character as a code, binary, and hex · Switch between ASCII and Unicode mode
Try these examples:
Hello
Cambridge
CS 2210
ASCII
Hello 😊
café
你好
Character encoding:

ASCII Calculations — The Standard Exam Format

Cambridge almost always gives you one letter's code and asks you to derive another. Work through each part before revealing.

📋 Question: The ASCII code for the letter 'M' is 77.
(a) Give the ASCII code for the letter 'P'.  [1]
(b) Give the ASCII code for the lowercase letter 'm'.  [1]
(c) Express the ASCII code for 'M' as an 8-bit binary number.  [1]
1
Part (a) — Find 'P'
M = 77. P is the 16th letter; M is the 13th. Count the gap and add to 77.
▶ Click to reveal
P is 3 positions after M in the alphabet (M→N→O→P). Since ASCII letters are sequential: P = 77 + 3 = 80 ASCII code for 'P' = 80 ✓
2
Part (b) — Find lowercase 'm'
Apply the +32 rule. M (uppercase) = 77. Lowercase always adds 32.
▶ Click to reveal
Lowercase = Uppercase + 32 'm' = 'M' + 32 = 77 + 32 = 109 ASCII code for 'm' = 109 ✓
3
Part (c) — Convert 77 to 8-bit binary
Use the place value method. 77 = ? Which columns are active?
▶ Click to reveal
Place values: 128 · 64 · 32 · 16 · 8 · 4 · 2 · 1 77 = 64 + 8 + 4 + 1 Bits: 0 1 0 0 1 1 0 1 'M' = 01001101 Verify: 64 + 8 + 4 + 1 = 77 ✓

Cambridge-Style Practice

Write your answers, then reveal the marking scheme to check and award yourself marks.

Question 1
3 marks
The ASCII code for 'D' is 68.
(a) Give the ASCII code for 'G'.  [1]
(b) Give the ASCII code for lowercase 'd'.  [1]
(c) Write the ASCII code for 'D' as an 8-bit binary number.  [1]
(a) G is 3 letters after D → 68 + 3 = 71[1]
(b) lowercase = uppercase + 32 → 68 + 32 = 100[1]
(c) 68 = 64 + 4 → 01000100[1]
Verify (c): 64 + 4 = 68 ✓ Binary: 0 1 0 0 0 1 0 0 PV: 128 64 32 16 8 4 2 1
Question 2
4 marks
Explain two differences between ASCII and Unicode.
ASCII uses fewer bits per character (7 or 8 bits) compared to Unicode (e.g. 16 or 32 bits in UTF-16/UTF-32) — so ASCII produces smaller file sizes than Unicode[1+1]
ASCII can only represent 128 (or 256 extended) characters, mainly English and Western European; Unicode can represent over 140,000 characters from every world writing system including emoji[1+1]
Allow: any valid pair — e.g. bits used | characters supported | language coverage | file size | backward compatibility (Unicode first 128 = ASCII)
Do not accept: "Unicode is better" or "Unicode is newer" without a specific technical difference
Question 3
2 marks
A text file contains the word "Cat". The file uses 7-bit ASCII.
(a) How many bits are used to store this word?  [1]
(b) Explain how the computer knows which character each stored value represents.  [1]
(a) 3 characters × 7 bits = 21 bits[1]
(b) The computer uses a character encoding table / ASCII lookup table — each stored binary number is mapped to a unique character; the table defines a one-to-one correspondence between numbers and characters[1]
Allow: "the number is used to look up the character in the ASCII table" as full answer for (b)
Question 4
2 marks
The binary pattern 01001000 is stored in an ASCII file. The ASCII code for 'A' is 65.
Identify the character this binary pattern represents. Show all working.
01001000 = 64 + 8 = 72 in denary[1]
72 − 65 = 7, so 7 letters after 'A' → character is 'H'[1]
Place values: 128 64 32 16 8 4 2 1 Binary: 0 1 0 0 1 0 0 0 Active: 64 + 8 = 72 72 - 65 = 7 → 7th letter after A → H ✓
Question 5
2 marks
State one advantage and one disadvantage of using Unicode instead of ASCII to store text files.
Advantage: Unicode can represent a much larger range of characters / supports all world languages and scripts / can store emoji and non-Latin characters that ASCII cannot[1]
Disadvantage: Unicode uses more bits per character / Unicode files are larger / requires more storage space than ASCII for the same text[1]

5-Question Challenge

ASCII codes, Unicode, the +32 rule, and file sizes. Complete all 5 to earn your XP and save your progress.

Score:
0 / 5
📝
Lesson 8 Complete — Text Encoder Unlocked!
+50 XP · Chapter 1 · Section B
🏆
Lesson Complete! Score: · Saved ✅
Next Lesson →