Programming Fundamentals * Pima Community College * CSC130 -- 5.1 Bits, Bytes, Numbers, and Structures

Go up to 5 Computer Constructs, Constraints, and Structures.
Go forward to 5.2 Data Representation - Standards

5.1 Bits, Bytes, Numbers, and Structures

5.1.1 Bits

In computers today (and for quite some time) the fundamental measurement of information is a bit. Historically, there have been analog computers and a few experimental computers that used "trits", or 3 state bits and the trinary number system. These will NOT be dealt with here.

The bit is a very small amount of information. A typical example is a light switch. It is on or it is off. It is safe to think that all information in a computer is stored and transmitted as bits. The manner that this is done varies. Some bit may be a group of photons in a fiber optic cable, another may be a magnetic domain in a disk drive, or a small amount of electric charge in a DRAM (Dynamic Random Access Memory) storage cell. These bits can be short lived or very permanent, depending on the use and the storage method. There organization can be serial, one bit in time after another such as EIA RS-232 and Ethernet, parallel (such as central memory in a computer, or block structured (as on a disk drive).

Most computer systems have been programmed or hardwired to shift from on form to another.

An individual single bit can be very important (or totally unused). It is a matter of interpretation. A bit can affect data transmission rate, change a "B" to a "C" (in ASCII), change the register used in a instruction, the location used for an instruction, change the address used by an instruction to the data used by an instruction, blow up a bomb, or miss-route a phone call.

Bits are often used on an individual basis. For example is A<B represents a bit. Are you male or female. Is a door locked or unlocked. Is your computer system using its paging system or not. (There is one bit that will be set to turn it on.)

There are various logical functions that can and are performed on bits. AND, OR, XOR (exclusive OR), NOT. Computer systems will usually have instructions for doing this type of operation.

Bits are more often thought of as part of a larger group. Small collections of bits are nibbles (4 bits), bytes (8 bits - sometimes 7 bits and sometimes 9 bits), words (12, 16, 18, 32, 40, 48, 64, 80, 128, 256, 512 bits [and other values too!] depending on the content and the system, blocks (groups of bytes or words), and clusters (groups of blocks)

One bit is not much information. It only has two possible states. It is when you have multiple bits that thing get interesting! Two bit can have different states. 00, 01, 10, and 11. Another way to look at it is 2*2. Each bit having two values. Three bits yields eight possibilities. Eight bits has 255. Ten bits has 1024, and 256 bits has 115792089237316195423570985008687907853269984665640564039457584007913129639936. For 512 bits you will bits to square that number. For a disk sector on an IBM PC with 512 bytes you will need to square that result twice, or

1044388881413152506691752710716624382579964249047383780384
2334832839539079715574568488268119349975583408901067144392
6283798757343818579360726323608785136527794595697654370999
8340361590134383718314428070011855946226376318839397712745
6723346843445866174968079087058037040712840487401186091144
6797778359802900668693897688178778594690563019026094059957
9453432823469303026696443059025015972399867714215541693835
5598852914863182379144344967340878118726394964751001890413
4900841706167509366833385055103297208826955076998361636941
1933015213796825837188091833656751221318492846368125550225
9983004123447848625956744921946170238065059132456108257318
3538008760862210283427019769820231316901767800667519548507
9921636419370285375124784014907159135459982790513399611551
7942711068311340905842728842797915548497829543235345170652
2326906139490598769300212296339568778287894844061600741294
5674919823050571642377154816321380631045902916136926708342
8564407304478999719017814657634732238502672530598997959960
9079946920177462481771844986745565925017832907047311943316
5550807568221846571746373296884912819520317457002440926616
9108741483850784119298045229818573389776481031260859030013
0241346718972667321649151113160292078173803343609024380470
8340403154190336

Each of these possibilities could be used for something different. If you use more bits, it costs more. You could use 32 bits for a character set instead of 7 or 8. Your data files would be 4 times larger and transmission time would take 4 times longer. It is a trade off that has been made. (An early TTY and most current TTY's used by deaf people to communicate use Baudot code with only 5 bits.)

The same trade off occurs with numbers. What numbers are you really going to use? Is two digits enough for a year. Are you ever going to have more than 31 days in a month? How big will your a account get. Remember you might want Bill Gates as a customer.

One of the simplest uses of a group of bits is fixed point numbers and character data. I will look at character representations first.

5.1.2 Characters

Now we will look at characters. Many standards exist for representing characters on computer systems. You could invent your own until computers started being networked. The most common standard in the US is ASCII (See: Nance & Naps Chapter 3 page 73, page 83 and Appendix 4 page A.16. Please note that Nance & Naps leave out part of the ASCII character set. They also fail to cover extensions. This includes all control characters and the IBM graphics characters.) the American Standard Code for Information Interchange.

There exists many other standards including International Character Sets, 16 bit character encoding, Kanji, EBCDIC (See: Nance & Naps Chapter 3 page 73 and Appendix 4 page A.17), and Hollerith Code.

Character sets are remarkably arbitrary. Someone (or a committee) chooses what set of ones and zeros represents what character. As long as everyone agrees, it works great. In ASCII, the character "A" is represented by the 7 bits 0 1 0 0 0 0 0 1. ASCII works well in the United States. England has problems. We use a dollar sign "$". England uses a pound sign "£". Other symbols that are not in the ASCII character set include Å, å, Æ, æ, Ø, ø, OE, oe, ß, ¿, ¡, ©, ¶, and §.

There are other character sets to solve some of the problems. But, as yet, no character set solves all of the problems. (For example Kanji and Sanskrit)

5.1.3 Enumerated Data

The total number of possibilities will be determined by the number of bits used to store the number. There are variable size schemes, but I will not address them now. If you have 4 bits you can store 16 numbers. Period. If you have a plan it can be ANY 16 numbers, but only 16. More common sizes are 8, 16, 32, and 64 bit numbers. This gives the following table of possible combinations.

Powers of 2

n
2ⁿ

8 256

16 65536

32 4294967296

64 18446744073709551616

Powers of 2
n	2ⁿ
8	256
16	65536
32	4294967296
64	18446744073709551616

There are some VERY common schemes for dealing with fixed point numbers. This allows for the use of specific instructions in a machine, such as add and subtract. You can STILL do this with scaled numbers if you are careful.

5.1.4 Fixed Point Numbers

Numbers can be represented in many ways. The representation of a number is NOT the number. This means that you can do your arithmetic in any representation and still get the same results (if your math is right).

This is of fundamental importance. If you produced different results in binary or in decimal who would want to use computers?

Computers for the most part use binary numbers. Binary numbers are long and difficult for people to deal with. In fact, most people have been dealing with decimal numbers most of their lives, and probably don't want to switch. As an example of what I mean by difficult lets look at a large binary number

1010100111111110101010000101011011100101010

in decimal it is 5,840,975,345,450. Yep, a big number. Which one would be easiest to copy correctly? Remember one bit can make a big difference.

If you wanted the decimal number in binary, changing the representation, you would probably want to use a computer. To do it by hand or by computer you need to divide the number by the base you are converting to, over and over again. Each remainder will give you one digit.

If you need to know which bits are one and which are zero internally in a machine, binary helps. But it is very error prone. To get around this problem people have used octal (base 8) and hexadecimal (base 16) numbers. For hexadecimal numbers we need some new characters to represent the value past 9. These are shown in the following table:

Hexadecimal Character Values

Character
Decimal Value Binary Value

0 0 0000

1 1 0001

2 2 0010

3 3 0011

4 4 0100

5 5 0101

6 6 0110

7 7 0111

8 8 1000

9 9 1001

A 10 1010

B 11 1011

C 12 1100

D 13 1101

E 14 1110

F 15 1111

Hexadecimal Character Values
Character	Decimal Value	Binary Value
0	0	0000
1	1	0001
2	2	0010
3	3	0011
4	4	0100
5	5	0101
6	6	0110
7	7	0111
8	8	1000
9	9	1001
A	10	1010
B	11	1011
C	12	1100
D	13	1101
E	14	1110
F	15	1111

So for the big binary number above:

Base  Number  
   2  1010100111111110101010000101011011100101010
   8  124776520533452
  10  5840975345450
  16  54FF542B72A

Note that as more information is packed into a character position, you need less characters to represent a number.

If you are dealing with unsigned integers, The usual representation used is binary. Each bit has twice the weight of the preceding bit. When all bits are zero, the value of the number is zero. So a one bit binary number has a minimum value of zero and a maximum value of 1. For 8 bits, the minimum is 0 and the maximum is 255. A table follows for some common unsigned integer sizes.

Minimum and Maximum Values for Unsigned Numbers

Bits
Minimum Maximum 2ⁿ-1

8 0 256

16 0 65536

32 0 4294967296

64 0 18446744073709551616

Minimum and Maximum Values for Unsigned Numbers
Bits	Minimum	Maximum 2ⁿ-1
8	0	256
16	0	65536
32	0	4294967296
64	0	18446744073709551616

For 2's complement signed integers the most significant bit takes on the value of -2^n-1. So for a 8 bit number the MSB (or Most Significant Bit) has a numeric value of -128. This gives us a range of values of -128 to 127. For 2's complement numbers out table looks like this:

Minimum and Maximum Values for Signed Numbers

Bits
Minimum -2^n-1 Maximum 2^n-1-1

8 -128 127

16 -32768 32767

32 -2147483648 2147483647

64 -9223372036854775808 9223372036854775807

Minimum and Maximum Values for Signed Numbers
Bits	Minimum -2^n-1	Maximum 2^n-1-1
8	-128	127
16	-32768	32767
32	-2147483648	2147483647
64	-9223372036854775808	9223372036854775807

Please note that this representation for signed numbers has one more negative number than positive numbers. This can cause problems.

5.1.5 Binary Coded Decimal -- BCD

Binary Coded Decimal (BCD) is an example of an enumerated data type. It uses 4 bits per digit. The first 10 values take on familiar meaning.

BCD Character Values

Binary
Decimal Value

0000 0

0001 1

0010 2

0011 3

0100 4

0101 5

0110 6

0111 7

1000 8

1001 9

1010 10

BCD Character Values
Binary	Decimal Value
0000	0
0001	1
0010	2
0011	3
0100	4
0101	5
0110	6
0111	7
1000	8
1001	9
1010	10

This leaves 6 other bit combinations. Typically at least two will be assigned meanings. One for a negative sign. The other for the end of the number. (When we get to character strings, this works the same way.) The reason for using this scheme is FAST input/output conversion. In a business setting where the amount of arithmetic that is done is small, sometimes the computation to convert in the number to/from binary/decimal exceeds the computation done with the number. Some systems (IBM & 8080) support some form of BCD arithmetic. I expect that some implementations of COBOL will store certain types of numbers in a BCD format.

5.1.6 Offset numbers

If the range of numbers that you are using is well determined and storage space is at a premium, it is possible to offset the numbers. This, today, is often a poor idea. Memory prices have dropped. The best example of this is dates. The year 1997 might be represented by using the the number 97 with an implied offset of 1900. Some companies are having problems with there programs as a result of this. It has been called the "Year 2000" problem. The original program designers failed to account for the real range of the numbers that the programs would encounter.

5.1.7 Scaled Numbers

Fixed point numbers can also be scaled. That is multiplied by a fixed value. The numbers can be represented as unsigned, signed, offset, and BCD numbers. A easy example of this is doing accounting in pennies instead of dollars. All of the internal math can be kept in cents. This, by the way, keeps the accountant happy. As we will see when we get to floating point numbers, the values don't always add up!

Addition and subtraction can be done without any adjustment. Multiplication and division need to be adjusted. Some hardware designs account for this, in other cases the programmer or the compiler need to make the adjustment. Binary, Octal, Hexadecimal, Decimal

5.1.8 Floating Point Numbers

Floating point numbers are a combination of what we have covered so far. It includes:

Scaled Numbers (for the fractional part)
Signed Integers (for the exponent part)
Enumerated parts (for thing that are not numbers)
Offset Numbers (to save a bit on normalized numbers)

The number need to represent a wide range of values with a fixed precision. These numbers can be very small to very large--a wide range. The fixed precision says that you will always have (about) the same number of digits that are accurate. Many measurements of real world quantities need this type of number. For example:

How thick is a piece of paper?
How far is it home?
How much area is in Australia?
How far is it to the moon?
How much does an Iron molecule weigh?
What is the voltage in your PC?
What is your gas mileage?

For these kinds if numbers floating point values are used. See: Nance & Naps Chapter 2 page 39 and Chaper 3 page 57. A floating point value is of the form mantissa*2^exponent. (Actually, the "2" is replaced by other values in some implementations.) Both the mantissa and the exponent are "normally" signed binary numbers. The mantissa is usually a scaled binary number so that it has a range from -1.000000...to +.999999... The sign bits are usually NOT stored next to the number. With some cleverness, it is possible to come up with a format that allows the comparison of floating point numbers with the same hardware as integers. (This is a lot like scientific notation if you are familiar with it.) Different implementations will have a different number if bits for the mantissa and the exponent. The sum of the bits will add up to, usually, 32 or 64. Intel 80x86 microprocessors use 80bits for sometimes. Each bit in the mantissa will result in log₁₀(2)*n digits. Or about .301 digits per bit.

Floating point hardware will often support a few enumerated items. Usually referred to as NaN or Not a Number. Intel supports several NaNs. Both signaling (creates an exception) and quiet. The NaNs are indicated by a specific exponent. In Intel's case all of the exponent bits are set to one. These values are primarily use to represent positive and negative infinity.

Most computer systems support floating point in hardware today. Small embedded computers may not. Hardware simply does it faster, it can be done in software. Super computer systems are often measured in how many floating point operations it can perform in a second. The Intel Ultra computer is rated at over 1,000,000,000,000 floating point operation per second. It is composed of over 9,000 Pentium-pro processors.

5.1.9 Addresses and Pointers

Several constructs in computer systems are addresses. Computer addresses are almost always positive unsigned numbers. See: Nance & Naps Chapter 18 page 845. For examples of pointers:

Any Address in central memory
Addresses of Structures
Address of allocated memory space
Return address from a subroutine call
Address of a Frame on a stack
A page table entry
An I/O port address
Address of a disk sector

For efficiency, sometimes, these values will be of limited precision. An example if this might be the cluster address in a FAT file system. The FAT file system used/uses various cluster sizes and pointer sizes to address the cluster.

5.1.10 Bit fields

Many I/O devices are controlled by special purpose registers. Today, many CPU functions are controlled the same way. For an example look at the Pentium processor System Architecture Overview. It is chapter 10 of the Pentium Processor Family Developer's Manual Volume 3: Architecture and Programming Manual

5.1.11 Structures

Structures are a combination of the above constructs. See: Nance & Naps Chapter 12 Records page 547. Some fields can be of variable length. The structure can be dealt with as a single construct. Structures can include other structures and/or pointers to structures.

It is also easy to think of instructions as a packed structure. In the structure you will find various fields. The fields, depending on the computer's instruction set can contain the following:

The base instruction. An enumerated type This is likely to be a variable length item. Some instructions will use more bits than others.
Register designation. An unsigned integer. This will be missing in some instructions. Others may specify more than one register.
An address. A pointer
An offset. A signed integer. This could be the offset into an array, a stack, or a structure.

Instructor: ltaber@pima.edu** Red's Home page** The Pima College Site