What is Url encoding?

Url encoding is used to convert non-ASCII or unprintable characters into urls safe format by replacing with ASCII characters.

  • 1. First encoding process convert characters into 8-bit bytes.
  • 2. Then after, it gets converted into hex each byte having 2 hex values like this (B3) with (%) sign at the front after converted looks like this (%B3).
  • How encoding process works?

    Let's take a Japanese character 平 first it gets converted into binary bytes. It look like this, 11100101 10111001 10110011 and then after it gets encoded into hex each hex digit represent 4 bits of data and 2 hex digits meaning 1 byte or 8 bit, after encoding it look like this %E5%B9%B3. We got three encoded hex values because non ASCII characters take more space while ASCII character take only one byte. Basically url encoding uses hex scheme with (%) percent sign. Encoding schemes like hex and base64 are mainly used where ASCII characters can be transferred only. Hex use 16 ASCII character 0-9 and A-F, all characters can be encoded into hex but url encoding convert some of them you can see it on given table.

    ClassificationCharactersEncoding required?
    Unreserved charactersAlphabets (A-Z a-z), Digits (0-9), tilde (~), underscore (_), hyphen (-), and dot (.)No
    Reserved characters: / ? # [ ] @ * $ & ' ( ) !+ , ; =Yes
    Unsafe charactersspace < > { } | ` ^ \Yes
    Non ASCII charactersCharacters outside the US-ASCII set.Yes

    Categorization of url encoding:

    Url encoding can be categorize into three parts

    1. Unreserved character: Also known as safe characters these character does not require encoding it includes Alphabets (A-Z a-z), Digits (0-9), hyphen (-), underscore (_) tilde (~), and dot (.)

    2. Reserved characters: Some characters like / have special meaning in url. so you can't put them in url without encoding.

    2. Unsafe characters: characters like | are known as a unsafe character so these character require encoding before placing.

    3. Non ASCII characters: Characters like 元 require encoding because URLs support ASCII character only.

    URL encoding table:

    Given table having five column first one shows the character number, in formal it is known as decimal no. Second column shows the Hex encoded values of characters. Third column shows Url encoded value some columns are blank because those character does not require encoding. Forth and the main one, sorry i put that column at fourth place it needs to be placed somewhere at second or third but anyways, so this column represent ASCII character like 1-9, a-z etc. As you can see some values are represented in black column these are caret notation mainly used for denoting purpose, fifth one is a character description.

    Dec Hex Enc Char Description
    0 00 blank ^@ Null (NUL)
    1 01 blank ^A Start of heading (SOH)
    2 02 blank ^B Start of text (STX)
    3 03 blank ^C End of text (ETX)
    4 04 blank ^D End of transmission (EOT)
    5 05 blank ^E Enquiry (ENQ)
    6 06 blank ^F Acknowledge (ACK)
    7 07 blank ^G Bell (BEL)
    8 08 blank ^H Backspace (BS)
    9 09 blank ^I Horizontal tab (HT)
    10 0A blank ^J Line feed (LF)
    11 0B blank ^K Vertical tab (VT)
    12 0C blank ^L New page/form feed (FF)
    13 0D blank ^M Carriage return (CR)
    14 0E blank ^N Shift out (SO)
    15 0F blank ^O Shift in (SI)
    16 10 blank ^P Data link escape (DLE)
    17 11 blank ^Q Device control 1 (DC1)
    18 12 blank ^R Device control 2 (DC2)
    19 13 blank ^S Device control 3 (DC3)
    20 14 blank ^T Device control 4 (DC4)
    21 15 blank ^U Negative acknowledge (NAK)
    22 16 blank ^V Synchronous idle (SYN)
    23 17 blank ^W End of transmission block (ETB)
    24 18 blank ^X Cancel (CAN)
    25 19 blank ^Y End of medium (EM)
    26 1A blank ^Z Substitute (SUB)
    27 1B blank ^[ Escape (ESC)
    28 1C blank ^\ File separator (FS)
    29 1D blank ^] Group separator (GS)
    30 1E blank ^^ Record separator (RS)
    31 1F blank ^_ Unit separator (US)
    32 20 blank %20 Space
    33 21 blank %21 ! Exclamation mark
    34 22 blank %22 " Quotation mark/Double quote
    35 23 blank %23 # Number sign
    36 24 blank %24 $ Dollar sign
    37 25 blank %25 % Percent sign
    38 26 blank %26 & Ampersand
    39 27 blank %27 ' Apostrophe/Single quote
    40 28 blank %28 ( Left parenthesis
    41 29 blank %29 ) Right parenthesis
    42 2A blank %2A * Asterisk
    43 2B blank %2B + Plus sign
    44 2C blank %2C , Comma
    45 2D blank - Hyphen/Minus
    46 2E blank . Full stop/Period
    47 2F blank %2F / Solidus/Slash
    48 30 blank 0 Digit zero
    49 31 blank 1 Digit one
    50 32 blank 2 Digit two
    51 33 blank 3 Digit three
    52 34 blank 4 Digit four
    53 35 blank 5 Digit five
    54 36 blank 6 Digit six
    55 37 blank 7 Digit seven
    56 38 blank 8 Digit eight
    57 39 blank 9 Digit nine
    58 3A blank %3A : Colon
    59 3B blank %3B ; Semicolon
    60 3C blank %3C < Less-than sign
    61 3D blank %3D = Equal/Equality sign
    62 3E blank %3E > Greater-than sign
    63 3F blank %3F ? Question mark
    Dec Hex Oct Char Description
    64 40 blank %40 @ Commercial at/At sign
    65 41 blank A Latin capital letter A
    66 42 blank B Latin capital letter B
    67 43 blank C Latin capital letter C
    68 44 blank D Latin capital letter D
    69 45 blank E Latin capital letter E
    70 46 blank F Latin capital letter F
    71 47 blank G Latin capital letter G
    72 48 blank H Latin capital letter H
    73 49 blank I Latin capital letter I
    74 4A blank J Latin capital letter J
    75 4B blank K Latin capital letter K
    76 4C blank L Latin capital letter L
    77 4D blank M Latin capital letter M
    78 4E blank N Latin capital letter N
    79 4F blank O Latin capital letter O
    80 50 blank P Latin capital letter P
    81 51 blank Q Latin capital letter Q
    82 52 blank R Latin capital letter R
    83 53 blank S Latin capital letter S
    84 54 blank T Latin capital letter T
    85 55 blank U Latin capital letter U
    86 56 blank V Latin capital letter V
    87 57 blank W Latin capital letter W
    88 58 blank X Latin capital letter X
    89 59 blank Y Latin capital letter Y
    90 5A blank Z Latin capital letter Z
    91 5B blank %5B [ Left square bracket
    92 5C blank %5C \ Reverse solidus/Backslash
    93 5D blank %5D ] Right square bracket
    94 5E blank %5E ^ Circumflex accent/Caret
    95 5F blank _ Underscore/Low line
    96 60 blank %60 ` Grave accent
    97 61 blank a Latin small letter a
    98 62 blank b Latin small letter b
    99 63 blank c Latin small letter c
    100 64 blank d Latin small letter d
    101 65 blank e Latin small letter e
    102 66 blank f Latin small letter f
    103 67 blank g Latin small letter g
    104 68 blank h Latin small letter h
    105 69 blank i Latin small letter i
    106 6A blank j Latin small letter j
    107 6B blank k Latin small letter k
    108 6C blank l Latin small letter l
    109 6D blank m Latin small letter m
    110 6E blank n Latin small letter n
    111 6F blank o Latin small letter o
    112 70 blank p Latin small letter p
    113 71 blank q Latin small letter q
    114 72 blank r Latin small letter r
    115 73 blank s Latin small letter s
    116 74 blank t Latin small letter t
    117 75 blank u Latin small letter u
    118 76 blank v Latin small letter v
    119 77 blank w Latin small letter w
    120 78 blank x Latin small letter x
    121 79 blank y Latin small letter y
    122 7A blank z Latin small letter z
    123 7B blank %7B { Left curly bracket
    124 7C blank %7C | Vertical line/Vertical bar
    125 7D blank %7D } Right curly bracket
    126 7E blank ~ Tilde
    127 7F blank DEL Delete (DEL)