Tuesday, August 4, 2015

How does the Internet work?

Explaining the Internet

World map
Connecting computers worldwide
The Internet is everywhere nowadays. It seems that virtually every electronic device is connected to the net in some way, and it's easy to just accept without questioning that clicking on a link brings up a new page of information - but if you want to understand a bit about how that actually happens, then this article will give you some insight into how the Internet really works.

The Internet is a system connecting together a vast number of computers and other electronic devices, in such a way that messages can be sent between any two connected devices. How those messages are passed from one location to another varies - perhaps using radio waves in a home wi-fi connection, or copper cables, or glass fibres, or microwaves, or cellphone signals - the Internet does not force any particular type of  link. As long as there is some way to get a message from one computer to another then the Internet will work with it.

The Internet will Find a Way

The connections can be pictured like roads on a map. Some are slow and narrow and can't handle much traffic, whereas some are wide and fast and allow many vehicles to travel long distances. If we imagine the internet as a road network, then in place of cars we would have little packets of digital information travelling from one place to another. At each junction on this network there is a device called a router, which steers every incoming packet of information onto the best connection to help it get to its destination. Eventually it will arrive at a router which can send it directly to its destination computer, and the packet will then have been transmitted across the internet.

The Internet avoiding congestion
Avoiding congestion
So how does a router know where the packet is headed? Well, rather like houses on the road map, every device connected to the internet has a kind of digital address; in this case in the form of a number known as an IP (Internet Protocol) address. When the internet was being developed, the standard for IP addresses was set as a sequence of four numbers, each in the range 0 to 255 - this kind of IP address is normally written with a dot between each of the numbers, e.g. 64.233.160.17

Every packet of information being sent over the internet is accompanied by the IP address of its destination, and that of the sending computer. So when a router receives a packet it can examine the destination IP address and make a decision on where to send it for the next leg of its journey. That decision can be influenced by various factors in addition to the router's knowledge of how the network is laid out - such as how congested a particular route is, whether the router at the other end is prepared to handle packets of a certain size, and so on - so packets may not always follow the same route to get to a particular destination, or take the same amount of time to get there. In fact if there is a communication problem then it's possible that a packet might not arrive at all. For example if packets are arriving at a particular router faster than they can be forwarded, then the router will have to simply throw them away.

Piecing Things Together

Reassembling file chunks
Sequencing chunks
Vanishing packets of information may sound like a recipe for disaster on a computer network. In fact there are certain cases such as video streaming or voice transmission where timing and speed are more important than accuracy and it may be preferable to have a small glitch in the data rather than hold everything up while the problem is resolved. But normally if something is sent over the internet, we want to be sure it arrives at the other end exactly as we sent it. This problem is solved by a messaging system known as TCP (Transmission Control Protocol.) Suppose we want to send a document from computer A to computer B. Under TCP the first thing that happens is that A sends B a single-packet message asking to strike up a conversation. Because the packet also contains the sender's IP address, B can send a reply back to A to say that it is ready to start the conversation. A then confirms receipt of B's reply, just to keep things working smoothly. Now A needs to send the document - but there is a limit on the amount of information that can be put into a single packet. The document has to be divided into small packet-sized pieces and sent one chunk at a time. Each chunk is numbered, so that B can reassemble the chunks in the correct sequence - remember that packets may not take the same route from A to B and so they may not arrive in the order they were sent! If B finds a numbered packet has not arrived after a certain time, it sends a message back to A asking for that particular packet to be re-transmitted, until eventually all the data has been collected at B and can be re-combined in the right order to form a copy of the original document. B can then signal to A that the document was received, and the conversation can be closed. In this way, TCP allows information to be sent reliably over the internet even if  there are problems or errors along the journey.

Clicking Links, Serving Names

Diagram of DNS  lookup
DNS lookup, storing of response
When you click on a link in a web site, your browser uses TCP to retrieve the information for the requested page. But the link itself doesn't mention IP addresses or anything, it just looks something like 'www.google.com' - so how does the browser know where to get the information from? This problem is solved by something known as DNS (Domain Name System) which is effectively a giant database, spread over many computers, allowing computers to look up the numerical IP address corresponding to the name of any website. Each computer that wants to use DNS know the IP address of one or two name server computers, and when a web address needs to be translated to an IP address, the computer simply forwards it to the name server. If the name server knows the answer, it will immediately return it to the calling computer, which can then go on to request the webpage using the provided IP address. If, however, the name server doesn't know the IP address then it forwards the web address to a higher-level name server, for example one which is designated as knowing all about .com addresses. This name server will then look up the name server for 'google' and forward the request to that device, which will then return the specific IP address for 'www.google.com.' The reply will go back along the chain of name servers to the original computer - each name server will also store the result for a while so that if it is asked again in the near future it will be able to reply immediately. Once our browser knows the IP address to use for 'www.google.com' it can then go on to ask for the relevant web page to be returned from that site.

Displaying the Page

The web page itself is a text document containing specially coded instructions in a language called HTML (HyperText Markup Language) that the browser interprets, allowing it to lay out the page as designed by its author. For example, sections of text can be marked as headings, normal text, bold or italic fonts, tables, etc. The page may also contain the web addresses of image files, which will be retrieved using further DNS and TCP requests so that they can be displayed in the page. It may also contain more links, which if clicked on will start the whole process off again.

A Technological Marvel

This is a fairly simple explanation of what goes on under the surface when you are browsing the Internet. Given the amount of back-and-forth messaging, file dicing-and-splicing, and error correction that goes on behind the scenes, it is sometimes astonishing that the Internet and Worldwide Web work as seamlessly and efficiently as they do!

A Note on IP Addresses

As mentioned above, IP addresses are commonly specified as a sequence of four numbers in the range 0 to 255. This system, known as IPv4, allows for around four billion different addresses, which was originally seen as more than sufficient. However, because of the huge number of devices currently connected to the internet, past inefficiencies in allocating the numbers, and the rise of the so-called 'Internet Of Things' (in which everything from light bulbs to washing machines will be connected to the net), that four billion is running out. A new system known as IPv6 is being introduced in which IP addresses consist of eight numbers in the range 0 to 65535, giving a total of around 340 billion billion billion billion different addresses. It is thought that this will be enough! Unfortunately the IPv4 system is still used by a large portion of the Internet's hardware so the transition to IPv6 is not straightforward, with various schemes for translation between the two formats being required as an interim measure.


Further Reading

If you would like to find out more, these Wikipedia articles may be of interest:
IP Addresses
Internet Protocol (TCP/IP)
Domain Name System
HTML


No comments:

Post a Comment