Warning
This chapter is still in development
Warning
This chapter hasn’t been written yet, but in the meantime there is a lesson plan that covers the material available from the NZACDITT site. The content of this chapter will mainly support this lesson plan.
Note
For teachers
This chapter supports the coding part of the NZ achievement standard AS91074 (2.44).
Note
For teachers
This section provides an introduction to the idea of compression, using a very simple form of an algorithm called run length coding. It is recommended that all students working on 2.44 work through the material in this section in order to satisfy the achieved requirements for compression (part of the encoding bullet point). To satisfy the full requirements for the encoding achieved bullet point, they should include simple examples of encryption and error control as well (check the relevant chapters for ideas).
[Need to mention that it does go a little beyond the achieved requirements, but that we still recommend working through it all, except for the advanced bit at the end].
This section intentionally stays away from the concept of bits, instead representing data using ASCII characters. While computers would actually use bits, and the use of ASCII characters in practice, like what is used in this section, is actually quite inefficient, we don’t want to put off students who are confused by bits, but could have demonstrated good understanding of the concepts behind compression. They have already showed what they did (or didn’t!) understand about bits in the first bullet points.
Not all the material is intended to be included in the students report; just the personalised project near the end should be used as evidence for their report. The activities before that are simply for learning about and understanding the concepts.
Please note that this section is not suitable for the merit/ excellence requirements for encoding. For these requirements students need to do a more advanced project on either compression, error control coding, or encryption that discusses and evaluates how a widely used technology i enabled by techniques from at least one of those 3 topics, whereas this section just focuses on a simplified version of an algorithm used in compression. If they choose compression to focus on for merit/ excellence, then later sections give ideas on how they could go about this. There are many different ways that compression systems function, and the examples in this section are intentionally simple ones to give the idea of how compression might be possible (the main method discussed is used for fax machines, which are becoming less common). Later sections will cover some other widely used methods.
One very simple way a computer can store this image is by using a format where 0 means white and 1 means black.. The above image would be represented in the following way
011000010000110 100000111000001 000001111100000 000011111110000 000111111111000 001111101111100 011111000111110 111110000011111 011111000111110 001111101111100 000111111111000 000011111110000 000001111100000 100000111000001 011000010000110
P1
15 15
011000010000110
100000111000001
000001111100000
000011111110000
000111111111000
001111101111100
011111000111110
111110000011111
011111000111110
001111101111100
000111111111000
000011111110000
000001111100000
100000111000001
011000010000110
Those first 2 lines are the header. The first specifies the format of the file, as there are several variations. When using ASCII 0’s and 1’s it is always “P1”. The second line specifies the width and then the height of the image. This allows the computer to know the size and dimensions of the image, even if the newline characters separating the rows in the file were missing. The rest of the data is the image, just like above. If you wanted to, you could copy paste this representation (including the header) into a text file, and save it with the file extension .pbm. If you have a program on your computer able to open PBM files, you could then view the image with it.
There is a similar version that uses actual bits rather than ASCII 0’s and 1’s to represent the image. The plus side of this is that the image is ⅛ of the size (remember from representations using bits that there are 8 bits in an ASCII character!), but the down side is that it is no longer human readable like the ASCII format you have been using is.
There are also similar formats that can be used for grey scale images, and colour images. Check the following wikipedia page if you are curious. http://en.wikipedia.org/wiki/Netpbm_format
You may have realised that there are 15 (rows) x 25 (columns) = 225 characters representing this image. Can we represent the same image using fewer characters, in a way that a computer would still be able to understand it?
The answer is yes, we can, using a technique known as run length encoding. In run length encoding, we replace each row with numbers that say how many blacks (1’s), and how many whites (0’s) are in a row, always starting with the number of whites. The first row in the image contains 1 white, 2 blacks, 4 whites, 1 black, 4 whites, 2 blacks, 1 white. This could be represented as;
1,2,4,1,4,2,1
The second row contains 1 black, 5 whites, 3 blacks, 5 whites, then 1 black. Because we need to say what the number of white’s is before we say the number of blacks, we need to explicitly say there are 0 whites at the start of the row. This would give
0,1,5,3,5,1
And the third row contains 5 whites, 5 blacks, 5 whites. This would give
5,5,5
So, we have determined that the first 3 rows of the file represented using this new representation would be;
1,2,4,1,4,2,1
0,1,5,3,5,1
5,5,5
Work out what the other rows would be, and write them out as well.
Which representation takes less space to store? In the original representation, 225 characters were required to represent the image. Count up the number of commas and digits (not spaces or newlines, ignore those) in the new representation. This is the number of characters required to represent the image with the new representation (to ensure you are on the right track, the first 3 rows that were given to you contain 29 characters)
Assuming you got the new image representation correct, and counted correctly, you should have found there are 119 characters in the new image (double check if your number differs!)
This means that the new representation only requires around 53% as many characters to represent! (119/225 100). This is a significant reduction in the amount of space required to store the image. The new representation is a *compressed form of the old one.
This representation is understandable by a computer, as it knows that it should draw each row with the specific numbers of whites and blacks in a row. It knows that the first number in each row represents the number of whites it should draw, and then the next is the number of blacks, and alternating. ..sentence doesn’t seem complete
Just to ensure that we can “undo” the compressed, what is the original representation (0’s and 1’s) of this (compressed) image?
4, 11, 3
4, 9, 2, 1, 2
4, 9, 2, 1, 2
4, 11, 3
4, 9, 5
4, 9, 5
5, 7, 6
0, 17, 1
1, 15, 2
What is the image of? How good was the compression on this image? (Look back at the calculation above for the amount of compression)
As the compressed representation of the image can be converted back to the original representation, and both the original representation and the compressed representation would give the same image when read by a computer, this means that the algorithm is lossless, i.e. none of the data was lost from compressing the image, and as a result the compression could be undone exactly.
Not all compression algorithms are lossless. In some types of files, in particular photos, sound, and videos, we are willing to sacrifice a little bit of the quality (i.e. lose a little of the data representing the image) if it allows us to make the file size a lot smaller. For downloading very large files such as movies, this can be essential to ensure the file size is not so big that it is infeasible to download! These compression methods are called lossy. If some of the data is lost, it is impossible to convert the file back to the exactly the original form when lossy compression was used. Later in this chapter, we will investigate the effects some lossy compression algorithms have on images and sound.
Now that you know how run length encoding works, you can come up with and compress your own black and white image, as well as uncompress an image that somebody else has given you.
Start by making your own picture with 1’s and 0’s. (Make sure it is rectangular - all the rows should have the same length.) You can either draw this on paper or do it on a computer (using a fixed width font, otherwise it can become really frustrating and confusing!). In order to make it easier, you could start by working out what you want your image to be on grid paper (such as that from a math exercise book) by shading in squares to represent the black ones, and leaving them blank to represent the white ones. Once you have done that, you could then write out the 0’s and 1’s for the image.
Work out the compressed representation of your image using run length coding, i.e. the run lengths separated by commas form that was explained above.
Now, give a friend a copy of the compressed representation (NOT the original uncompressed representation), and get them to give you a copy of theirs. You should each uncompress the other person’s image, to get back to the original uncompressed representations. Check to make sure the conversions back to the uncompressed representations was done correctly by making sure the images are the same.
Imagining that you and your friend are both computers, by doing this you have shown that images using these systems of representations can be compressed on one computer, and decompressed on another. It is very important for compression algorithms to have this property in order to be useful. It wouldn’t be very good if a friend gave you a song they’d compressed on their computer, but then your computer was unable to make sense of the representation the compressed song was using!
Note that the rest of this section is intended for those who are really keen. It is fine to skip over it and look at the next sections instead
What is the image with the best compression (i.e. an image that has a size that is a very small percentage of the original) that you can come up with? This is the best case performance for this compression algorithm.
What about the worst compression? Can you find an image that actually has a larger compressed representation? (Don’t forget the commas!). This is the worst case performance for this compression algorithm.
..A: describing the concept of encoding information using compression coding, error control coding, and encryption; and typical uses of encoded information .. M: discussing how a widely used technology is enabled by one or more of compression coding, error control coding, and encryption .. E: evaluating a widely used system for compression coding, error control coding, or encryption
[an area that is worth knowing about, including activities/exercises to explore it, and guidance for teachers (possibly to be separated) on how to help students use this topic for A/M/E
[explain where the material above has oversimplified things, and if there are any well-known concepts or techniques that have been deliberately left out because they are too complex for this age group. This may include things that require advanced maths, advanced programming, or things where students have seen the problem but not a thorough solution]
- Images, run-length-coding http://csunplugged.org/image-representation This is also relevant to binary representations in general, although is probably best used in the compression section.
- Text compression http://csunplugged.org/text-compression
- Information theory http://csunplugged.org/sites/default/files/activity_pdfs_full/unplugged-05-information_theory Vaguely relevant, not likely to use as for this level, it goes too far past what we would want them to know about compression