Character encoding (Generation I): Difference between revisions

From Bulbapedia, the community-driven Pokémon encyclopedia.
Jump to navigationJump to search
(Re-edited changes back differently after discussion with Wowy.)
m (Undo revision 2948308 by Junebug12851 (talk)Tables should not be images)
Line 10: Line 10:


===English===
===English===
 
Bytes with a dark gray background are not used normally in the English games. Characters with a light gray background are holdovers from the Japanese game but that are not used in the English game.
Text codes are directly mapped to VRAM minus some exceptions for control codes, variables, and strings. Added is a color-coded raw snapshot from VRAM for better clarification. The early character codes map to whatever  tilemap is loaded into VRAM currently and most of that is directly accessible from character codes. The parts of the tilemap that aren't accessible are special purpose codes such as control codes (new lines and string termination or null), variables (player name, rival name, or Pokemon Names in Battle), and strings (Pokédex or PKMN) all of which are represented by 1 byte.
:{| style="text-align: center; border-collapse: collapse" cellpadding="2px" width="375px"
 
|-
The rest of the character codes after the control codes all directly accesses the corresponding VRAM tile and most of it is text or drawing characters. Color coded in '''green''' in the image below are characters a player can select on the game keyboard. These are always guaranteed to be persistent whenever text is shown and will never render any differently because they are always placed in the same locations in VRAM (Tile 0xE1 will always be the PK tile when text is shown). Tiles in '''purple''' are only used by the game to draw a GUI and are not guaranteed to always be there. Tiles in '''blue''' are characters used in in-game dialog but are not on the in-game keyboard and are not always guaranteed to be present. Tiles in '''red''' are not used at all in the games but are still present and not always guaranteed to be there. Tiles in the '''Optional Tilemap Overflow''' are mostly empty and serve as a sort of overflow when game code needs more map tiles to display. One example of this is the Player Badge Screen which uses the tilemap overflow section.
! || -0 || -1 || -2 || -3 || -4 || -5 || -6 || -7 || -8 || -9 || -A || -B || -C || -D || -E || -F
 
|-
When a tile that isn't guaranteed to be there is ever not there, a leftover tile will be in it's place which is often blank. All tiles 0-255 or 0x00-0xFF are always accessible and will never throw errors. A tile not guaranteed to be present means it's contents could change often throughout game play. This can lead to some pretty fun effects with referencing the early character codes and enjoying that names or strings appear different in different areas of the game.
! 0-
 
| <small>NULL</small> || colspan=15 style="background: #bbb" |
[[File:English Gen I Character Set.png|450px|thumb|none]]
|-
! 1-
| colspan=16 rowspan=3 style="background: #bbb" | ''Junk''
|-
! 2-
|-
! 3-
|-
! 4-
| colspan=8 style="background: #bbb" | || colspan=8 | ''Control characters''
|-
! 5-
| colspan=16 | ''Control characters''
|- style="background: #ddd"
! style="background: #fff" | 6-
| A || B || C || D || E || F || G || H || I || V || S || L || M || style="background: #fff" | : || ぃ || ぅ
|- style="background: #ddd"
! style="background: #fff" | 7-
| ‘ || ’ || “ || ” || ・ || … || ぁ || ぇ || ぉ || [[File:Character 0x79 i.png]] || = || [[File:Character 0x7B i.png]] || <nowiki>||</nowiki> || [[File:Character 0x7D i.png]] || [[File:Character 0x7E i.png]] || style="background: #fff" | 
|-
! 8-
| A || B || C || D || E || F || G || H || I || J || K || L || M || N || O || P
|-
! 9-
| Q || R || S || T || U || V || W || X || Y || Z || ( || ) || : || ; || [ || ]
|-
! A-
| a || b || c || d || e || f || g || h || i || j || k || l || m || n || o || p
|-
! B-
| q || r || s || t || u || v || w || x || y || z  || é || 'd || 'l || 's || 't || 'v
|-
! C-
| colspan=16 rowspan=2 style="background: #bbb" | ''Junk''
|-
! D-
|-
! E-
| ' || <sup>P</sup><sub>K</sub> || <sup>M</sup><sub>N</sub> || -  || 'r || 'm || ? || ! || . || style="background: #ddd" | ァ || style="background: #ddd" | ゥ || style="background: #ddd" | ェ || ▷ || ▶ || ▼ || ♂
|-
! F-
| {{PDollar}} || ×  || . || / || ,  || ♀ || 0 || 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9
|}


In the Japanese games (as can be seen below), 0xF2 is distinguishable from 0xE8, with the former meant as a decimal point while the latter is punctuation. Presumably this intention was largely inherited when the English games were made, as most of the game's script uses 0xE8 exclusively; however, 0xF2 appears in the character table for user input, meaning it may appear in user-input names (and, conversely, 0xE8 never should).
In the Japanese games (as can be seen below), 0xF2 is distinguishable from 0xE8, with the former meant as a decimal point while the latter is punctuation. Presumably this intention was largely inherited when the English games were made, as most of the game's script uses 0xE8 exclusively; however, 0xF2 appears in the character table for user input, meaning it may appear in user-input names (and, conversely, 0xE8 never should).

Revision as of 03:53, 3 April 2019

050Diglett.png This article is incomplete.
Please feel free to edit this article to add missing information and complete it.
Reason: French, German, Italian, and Spanish character encodings

The Generation I games use a proprietary character encoding to store text data. Versions of the games in different languages may use different encodings, some more different than others.

Fixed-length user-input strings are terminated with 0x50. If a fixed-length string is terminated before using its full capacity, the contents of the remaining space are not specified.

Character sets

Note that 0x7F is a space (" "), not empty. All characters that are not control characters print in one character.

In some contexts, some characters may display differently than suggested below. For example, in the character input table, ED is 0xF0 instead of the Pokémon Dollar symbol, and in the Pokédex (in English), the feet (') and inches (") marks are 0x60 and 0x61.

English

Bytes with a dark gray background are not used normally in the English games. Characters with a light gray background are holdovers from the Japanese game but that are not used in the English game.

-0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -A -B -C -D -E -F
0- NULL
1- Junk
2-
3-
4- Control characters
5- Control characters
6- A B C D E F G H I V S L M :
7- Character 0x79 i.png = Character 0x7B i.png || Character 0x7D i.png Character 0x7E i.png
8- A B C D E F G H I J K L M N O P
9- Q R S T U V W X Y Z ( ) : ; [ ]
A- a b c d e f g h i j k l m n o p
B- q r s t u v w x y z é 'd 'l 's 't 'v
C- Junk
D-
E- ' PK MN - 'r 'm ? ! .
F- $ × . / , 0 1 2 3 4 5 6 7 8 9

In the Japanese games (as can be seen below), 0xF2 is distinguishable from 0xE8, with the former meant as a decimal point while the latter is punctuation. Presumably this intention was largely inherited when the English games were made, as most of the game's script uses 0xE8 exclusively; however, 0xF2 appears in the character table for user input, meaning it may appear in user-input names (and, conversely, 0xE8 never should).

The full list of characters that are available for user input are: A-Z and a-z, space, and the following: ×():;[]PKMN-?!♂♀/.,.

Tilemap sections

The game sections off various areas of the tilemap loaded into VRAM and each character code directly corresponds to a tile in the tilemap. Not all tiles in the tilemap are accessible via character code, but many are.

  1. VRAM addresses 0x9000 to 0x9480 correspond to a portion of the current tileset of the map. Character codes 0x01 to 0x48 and 0x4D directly correspond to them. For example, while the player is outside, tile #3 is the animated flower so character code 0x03 will place the animated flower in text, but in other locations (such as in battle or in a cave), a completely different tile will be displayed.
    1. Characters 0x49 - 0x5F are also in this same section, but with the exception of 0x4D, they are control characters that link to code rather than the tile they would normally correspond to.
  2. VRAM addresses 0x9600 to 0x97F0 partially corresponds to characters 0x60-0x7F. This is where the user interface tiles are stored, such as bold letters and tiles that are used to draw borders for text boxes and menus. The space character is also in this range. These tiles can sometimes change, meaning that characters that reference them may print out a different tile image; however, they are far more consistent than tiles in the 0x9000 to 0x9480 range.
  3. VRAM addresses 0x8800 to 0x8BF0 corresponds to characters 0x80-0xBF. This is where the main font is placed when rendering text.
  4. VRAM addresses 0x8C00 to 0x8DF0 are split into 2 tile sections:
    1. The range 0xC0-0xDF is reserved for certain areas that need extra space for extra tiles. As such, they are usually unoccupied, so normally only print blank characters. The player info screen is an example of a screen that uses some of this space.
    2. The range 0xE0-0xFF includes numbers, some symbols, and more user interface characters. The player-enterable characters PK, MN, and gender symbols are also stored here.

Character codes

Character codes are within the 0x49-0x5F range, with the exception of 0x4D which defaults to tile 4D.

Control characters work by intercepting the tile that would normally correspond to the control character and instead perform a different action whether it be end the text or print a lengthy message.

Dialogue control codes

These control codes control dialogue text placement, paging, etc.

  • 0x49 - "page" - Begins a new Pokedex page
  • 0x4B - "_cont"- Stops and waits for confirmation before scrolling the dialogue down by 1
  • 0x4C - "autocont" - Scroll dialogue down 1 without waiting for confirmation
  • 0x4E - "next line" - Move a line down in dialogue
  • 0x4F - "bottom line" - Write at the last line of dialogue
  • 0x50 - "end" - Marks the end of a string
  • 0x51 - "paragraph" - Begin a new dialogue page with button confirmation
  • 0x55 - "cont" - A variation of 0x4B and 0x4C
  • 0x57 - "done" - Ends text box
  • 0x58 - "prompt" - Prompts to end textbox
  • 0x5F - "dex" - Displays a period and ends the Pokédex entry
Variable control codes

These control codes print text defined elsewhere.

  • 0x52 - "players name" - The player's name
  • 0x53 - "rivals name" - The rival's name
  • 0x59 - "target" - In battle, the target of a move. If the dialogue is referring to the opponent's Pokémon, "Enemy " will be prepended to the Pokémon's name; if referring to the player's Pokémon, it will just display the Pokémon's name. Outside of battle, it will retain the last value that was stored in it in-battle.
  • 0x5A - "user" - In battle, the user of a move. Just like "target", "Enemy " will be prepended to the name of opposing Pokémon.
Text control codes

These control codes print a hardcoded string. They are used to decrease the number of bytes to write common strings while still rendering as the correct number of characters.

  • 0x4A - "pkmn" - Prints "PKMN"
  • 0x54 - "poke" - Prints "Poké"
  • 0x56 - "......" - Print 2 ellipses, "……"
  • 0x5B - "pc" - Prints "PC"
  • 0x5C - "tm" - Prints "TM"
  • 0x5D - "trainer" - Prints "TRAINER"
  • 0x5E - "rocket" - Prints "ROCKET"

Japanese

Technically all characters under 0x60 are control characters, the majority of which have the behavior of causing a specific character from the main font (0x80-0xFF) to be printed with a diacritic in the space above it. Those characters that have different, more complicated functions are detailed below.

-0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -A -B -C -D -E -F
0- NULL イ゛ エ゛ オ゛
1- ナ゛ ニ゛ ヌ゛ ネ゛ ノ゛ マ゛ ミ゛ ム゛
2- ィ゛ あ゛ い゛ え゛ お゛
3- な゛ に゛ ぬ゛ ね゛ の゛ ま゛
4- ま゜ Control も゜ Control
5- Control characters
6- A B C D E F G H I V S L M
7- Character 0x79 i.png = Character 0x7B i.png || Character 0x7D i.png Character 0x7E i.png  
8-
9-
A-
B-
C-
D-
E- ? !
F- × . / 0 1 2 3 4 5 6 7 8 9

0xE4 and 0xE5 cause the following character to be printed with that diacritic above it.

Japanese control characters

050Diglett.png This section is incomplete.
Please feel free to edit this section to add missing information and complete it.
Reason: Incomplete or missing functions for control bytes. Alternate defaults in different games/other languages
  • 0x4A: Prints
  • 0x52: Prints the player's name.
  • 0x53: Prints the rival's name.
  • 0x54: Prints ポケモン in Japanese games.
  • 0x59: Prints the inactive Pokémon's name in battle. (In specific circumstances, the game may "pretend" that the inactive Pokémon is actually active and vice versa.)
    • てきの  in Japanese games.
  • 0x5A: Prints the active Pokémon's name in battle. The default value is empty. (In specific circumstances, the game may "pretend" that the active Pokémon is actually inactive and vice versa.)
  • 0x5B: Prints パソコン in Japanese games.
  • 0x5C: Prints わざマシン in Japanese games.
  • 0x5D: Prints トレーナー in Japanese games.
  • 0x5E: Prints ロケットだん in Japanese games.


Data structure in the Pokémon games
General Character encoding
Generation I Pokémon speciesPokémonPoké MartCharacter encodingSave
Generation II Pokémon speciesPokémonTrainerCharacter encoding (Korean) • Save
Generation III Pokémon species (EvolutionPokédexType chart)
Pokémon (substructures) • MoveContestContest moveItem
Trainer TowerBattle FrontierCharacter encoding (GameCube) • Save
Generation IV Pokémon species (EvolutionLearnsets)
PokémonSaveCharacter encoding (Wii)
Generation V–present Character encoding
Generation VIII Save
TCG GB and GB2 Character encoding


Project Games logo.png This data structure article is part of Project Games, a Bulbapedia project that aims to write comprehensive articles on the Pokémon games.