C—Web science

Home     IBCS

Computer Science Wiki - Web_Science computersciencewiki.org Web_Science page Excellent browsing site for web science topics
WikiBooks web sceince wikibook_IBCS_Web_Science Information about CORE (C1-4) web science topics
WAMP tutorial WAMP tutorial Windows, Apache, MySQL, PHP are often used in combination by web developers.
The CS clasroom webscience video tutorial (2hrs20 CS_Classroon_Web_Science_YouTube In depth information about all web science topics
What is a search engine https://www.techtarget.com/whatis/definition/search-engine search engine page on Techtarget definitions database.
How does a Google search work https://support.google.com/webmasters/answer/9128586?hl=en The basics from google support page
Web Application Architecture https://stackify.com/web-application-architecture/ Stackify guide to: How It Works, Trends, Best Practices and More
Tutorchase https://www.tutorchase.com/notes/ib/computer-science Contains a good Web Science section - some direct links in the topic sections below...

C.1 Creating the web (8 hours)

Students will be expected to have completed practical activities linked to developing different types of web
pages and be able to evaluate when a particular type of web page is most appropriate.

Click for C.1 Links page containing the information you need

C.1.1 Distinguish between the Internet and the World Wide Web (web).

The Internet is the interconnected network of networks and devices that uses IP addresses to identify them and low level network protocols (IP) to transfer data.

The world wide web is the network of resources held on the computers of the Internet. It is also the way of transferring the resources on the internet using high level protocols such as http.

Explanation from webopedia

C.1.2 Describe how the web is constantly evolving.

Students will be expected to be aware of the major differences between the early forms of the web, Web 2.0, the semantic web and later developments.

Develop an appreciation of the possibilities and limitations associated with the evolution of the web.

Introduction to the semantic web [Cambridge semantics]

W3C What is the semantic web?

C.1.3 Identify the characteristics of the following:

Item Characteristic
hypertext transfer protocol (HTTP) The underlying protocol used by the World Wide Web which defines how messages are formatted and transmitted using Client actions and Server responses. HTTP is a stateless application layer protocol. HTTP uses Request Methods (POST, GET, [PUT, DELETE]) for data handling. Text based protocol (not secure as unencrypted).
hypertext transfer protocol secure (HTTPS) HTTP that uses Transport Layer Security (TLS) protocol to provide authentication and encryption. Older technology uses secure socket layers (SSL)
hypertext mark-up language (HTML) HTML is the standard markup language for creating Web pages HP html guide
uniform resource locator (URL) The global address of documents and other resources on the World Wide Web consisting of protocol, domain name (or IP address, file
extensible mark-up language (XML) eXtensible Markup Language was designed to store and transport data and to be both human- and machine-readable.
extensible stylesheet language transformations (XSLT) Transformations which typically use XSL to transform XML documents into other formats (like HTML)
JavaScript. (JS) JS is the most used client side script. A programming language that can be interpreted by the browser.
cascading style sheet (CSS) CSS describes how HTML elements are to be displayed. External stylesheets are stored in CSS files and can control the layout of multiple web pages all at once

C.1.4 Identify the characteristics of the following:

URI uniform resource identifier

URL. uniform resource locator

URLs and URNs are special forms of URIs.
A URI that identifies a mechanism by which a resource may be accessed is usually referred to as a URL.
HTTP URIs are examples of URLs.
Some URI's are not URL's for example a URN s provides globally unique names for resources. If the URI has urn as its scheme and adheres to the requirements of RFC 2141 and RFC 2611, it is a URN. The ISBN of the book REST in Practice by J.Webber, S.Parastatidis, I.Robinson from which this paragraph is partly taken is ISBN-13: 978-0596805821. It identifies the book uniformly (same format as for other books) but the book could be located in more than one place so it is not a URL.
ref: [https://stackoverflow.com/questions/42534419/examples-of-uri-url-and-urn]

C.1.5 Describe the purpose of a URL.

A way of uniquely locating a resource most commonly using http for example:
protocol://domain (or IP address)/path[?query#fragment]
http://hockerillct.com/16/CT/ib/web_science.html

C.1.6 Describe how a domain name server functions.

The Domain Name System comprises many servers which maintain and distribute domain name and corresponding IP addresses. An http request will typically contain a domain name but what is needed for communication to take place is the IP address. When a new http request is made using a domain name the DNS will find a server which has the IP address of the domain and send this to the requesting computer so it can make the page request using the IP address of the machine where the domain is hosted.

C.1.7 Identify the characteristics of:

Item Characteristic
internet protocol (IP) takes the network packets from the transport layer and sends them to the proper destinations based on their IP addresses
transmission control protocol ( TCP) creates and delivers the data packets passed on from the application layer to the appropriate host devices by adding source and destination port numbers and maintaining the end-to-end network connections.
file transfer protocol (FTP) When an FTP client requests to connect to an FTP server, a TCP connection is being established using the application layer within TCP and ports 20 and 21. FTP uses and relies on TCP to ensure all the packets of data are sent correctly and to the proper destination.

information taken from: CEBERUS: How Does TCP/IP Relate to FTP? and cellbiol: The TCP/IP family of Internet protocols

simple html page

C.1.8 Outline the different components of a web page.

To include features such as metatags, title, etc.

Simple html page ==> [w3schools how to write a website]

C.1.9 Explain the importance of protocols and standards on the web.

Protocols enable compatibility through a common "language" internationally.

C.1.10 Describe the different types of web page.

This should include examples such as personal pages, blogs, search engine pages, wikis, forums.

C.1.11 Explain the differences between a static web page and a dynamic web page.

To include analysis of static HTML web pages and dynamic web pages, eg PHP , ASP.NET , Java Servlets Ajax .

Best languages for server side programming (wpwebinfotech.com)

C.1.12 Explain the functions of a browser.

A bowser is an application (software) that allows users to access and view webpages. It can make http requests and renders the resulting html code including links and references to multimedia, styling, scripting etc.

C.1.13 Evaluate the use of client-side scripting and server-side scripting in web pages.

Serverside first steps (mozilla.org)

C.1.14 Describe how web pages can be connected to underlying data sources.

Students will not be expected to write code (MySQL for example) to indicate how the connection is made, but should understand the principles of connecting to an underlying data source.

C.1.15 Describe the function of the common gateway interface (CGI).

C.1.16 Evaluate the structure of different types of web pages.

C.2 Searching the web (6 hours)

C2 Information page and links               C2 Questions and tasks

C.2.1 Define the term search engine.

C.2.2 Distinguish between the surface web and the deep web.

TOK Data is always accessible.

C.2.3 Outline the principles of searching algorithms used by search engines.

Students will be expected to understand only the principles of the PageRank and HITS algorithms.
General principles of computational thinking, connecting computational thinking and program design.

PageRank (searchenginejournal.com)

C.2.4 Describe how a web crawler functions.

Teachers should be aware of the range of terms that can be associated with web crawlers such as bots, web spiders, web robots.

C.2.5 Discuss the relationship between data in a meta-tag and how it is accessed by a web crawler.

Key learning

Evolution of metatags and why they are not so important now (except the title?)

Students should be aware that this is not always a transitive relationship. If website A is realted to website B and website B is realted to website C does not mean than website C is related to website A.

TOK Data may not always have the intended meaning.

C.2.6 Discuss the use of parallel web crawling.

Web crawler parallezation policy (Wikipedia)

Distributed Web crawling (Wikipedia)

C.2.7 Outline the purpose of web-indexing in search engines.

C.2.8 Suggest how web developers can create pages that appear more prominently in search engine results.

C.2.9 Describe the different metrics used by search engines.

Students will be expected to test specific data in a range of search engines, for example examining time taken, number of hits, quality of returns.
An understanding of search engine metrics could lead to exploitation.

C.2.10 Explain why the effectiveness of a search engine is determined by the assumptions made when developing it.

Students will be expected to understand that the ability of the search engine to produce the required results is based primarily on the assumptions used when developing the algorithms that underpin it.

LINK Connecting computational thinking and program design.

C.2.11 Discuss the use of white hat and black hat search engine optimization.

AIM 8 Developers of search engines should have a moral responsibility to produce an objective page ranking.

C.2.12 Outline future challenges to search engines as the web continues to grow.

issues such as error management, lack of quality assurance of information uploaded.
AIM 9 Develop an appreciation that search engines will need to evolve to remain effective as the web grows.

C.3 Distributed approaches to the web (6 hours)

C.3.1 Define the terms: mobile computing, ubiquitous computing, peer-2-peer network, grid computing.

C.3.2 Compare the major features of:

• mobile computing
• ubiquitous computing
• peer-2-peer network
• grid computing.

LINK Networks.

C.3.3 Distinguish between interoperability and open standards.

C.3.4 Describe the range of hardware used by distributed networks.

Students should be aware of developments in mobile technology that have facilitated the growth of distributed networks.

C.3.5 Explain why distributed systems may act as a catalyst to a greater decentralization of the web.

INT Decentralization has increased international-mindedness.

C.3.6 Distinguish between lossless and lossy compression.

Students will not be required to study the detailed compression algorithms.

C.3.7 Evaluate the use of decompression software in the transfer of information.

Students can test different compression methods to evaluate their effectiveness.

C.4 The evolving web (10 hours)

C.4.1 Discuss how the web has supported new methods of online interaction such as social networking.

C4.1 summary and questions

Students should be aware of issues linked to the growth of new internet technologies such as Web 2.0 and how they have shaped interactions between different stakeholders of the web.

E, AIM 8 Emerging technologies are modifying users' behaviour.

C4 tutorchase notes (Part 1: Online Interaction and Social Networking)

C.4.2 Describe how cloud computing is different from a client-server architecture.

Student should address the major differences only.

LINK Networks.

C4 tutorchase notes (Part 2: Cloud vs Traditional Client-Server)

C.4.3 Discuss the effects of the use of cloud computing for specified organizations.

To include public and private clouds.
AIM 8 Cloud computing could potentially conflict with privacy.

C.4.4 Discuss the management of issues such as copyright and intellectual property on the web.

Students should investigate sites such as TurnItIn and Creative Commons. https://www.tutorchase.com/notes/ib/computer-science/c-4-3-intellectual-property-and-privacy

C4 tutorchase article (primary concerns of intellectual property)

C.4.5 Describe the interrelationship between privacy, identification and authentication.

C4 tutorchase notes (Part 3: Intellectual property and privacy)

C.4.6 Describe the role of network architecture, protocols and standards in the future development of the web.

C4 tutorchase notes (Part 4: Future web development)

AIM 9 Develop an appreciation that the future development of the web will have an effect on the rules and structures that support it.

C.4.7 Explain why the web may be creating unregulated monopolies.

INT, S/E, AIM 8 The web is creating new multinational online oligarchies.

C.4.8 Discuss the effects of a decentralized and democratic web.

C4 tutorchase article (impact of decentralization on web performance)

S/E, INT The web has changed users' behaviours and "removed" international boundaries.

HL Extension C.5 Analysing the web (5 hours)

Introduction document Introduction to web science (a bit outdated now but worth a read) PDF presentation of key ideas for C5

C.5.1 Describe how the web can be represented as a directed graph.

The vertices (nodes) represent web pages and the edges represent hyperlinks.
It is not a complete graph. The directed graph formed by the web is known as the web graph.

http://en.wikipedia.org/wiki/Directed_graph
http://mathinsight.org/network_introduction

C.5.2 Outline the difference between the web graph and sub-graphs.

A sub-graph will be assumed to be a set of pages linked to one specific topic.

C.5.3 Describe the main features of the web graph such as bowtie structure, strongly connected core (SCC), diameter.

Students must be aware the web has a structure that has emerged from the behaviour of web users.

C.5.4 Explain the role of graph theory in determining the connectivity of the web.

LINK Mathematics: graph theory.

The eccentricity of a vertex is the greatest minimum distance between itself and any other vertex. It can be thought of as how far a node is from the node most distant from it in the graph.

The diameter of a graph is the maximum eccentricity of any vertex in the graph. That is the greatest distance between any pair of vertices i.e. find the shortest path between each pair of vertices then the greatest length of any of these paths is the diameter of the graph.

C5 Web graphs

C.5.5 Explain that search engines and web crawling use the web graph to access information.

Students should be aware of the Page Rank algorithm and explain how it works. No calculations are required.

C.5.6 Discuss whether power laws are appropriate to predict the development of the web.

Cambridge power law and web graph lecture notes

HL Extension C.6 The intelligent web (10 hours)

Introduction document Tutorchase: Introduction to semantic web foundations HTML presentation of key ideas for C6

C.6.1 Define the term semantic web.

The semantic web is an abstract web concept – currently in development.
The aim is to have web pages organised and labelled in a better way to aid organisation and searching.
http://www.w3.org/standards/semanticweb/

"The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners." [w3C]

Meta data added to web pages could make web pages easier to read and organise by computers.
To paraphrase Tim Berners-Lee, inventor of the World Wide Web, these tools will let the Web -- currently similar to a giant book -- become a giant database.
XML and RDF are the "official language" of the Semantic Web, but by themselves they're not enough to make the entire Web accessible to a computer.

http://computer.howstuffworks.com/semantic-web.htm

C.6.2 Distinguish between the text-web and the multimedia-web.

The traditional web is seen as being text based, the semantic web is multimedia based.
AIM 9 Develop an appreciation of the possibilities and limitations associated with the continuing evolution of the web.

C.6.3 Describe the aims of the semantic web.

Google bard answer to aims of sematic web

C.6.4 Distinguish between an ontology and folksonomy.

The core meaning of ontology within computer science is a model for describing the world that consists of a set of types, properties, and relationship types. An ontology is   a vocabulary that describes objects and how they relate to one another

The Dublin Core was the first metadata standard for describing web content. The resources described using the Dublin Core may be digital resources (video, images, web pages, etc.) as well as physical resources such as books or works of art. Dublin Core metadata may be used for multiple purposes, from simple resource description to combining metadata vocabularies of different metadata standards, to providing interoperability for metadata vocabularies in the linked data cloud and Semantic Web implementations. Wikipedia - Dublin core

https://www.dublincore.org/

Web Ontology Language (OWL) - OWL, the most complex layer, formalizes ontologies, describes relationships between classes and uses logic to make deductions. It can also construct new classes based on existing information. OWL is available in three levels of complexity -- Lite, Description Language (DL) and Full.
http://en.wikipedia.org/wiki/Ontology_(information_science)

Folksonomy – social tagging
The tagging is done by users, often simultaneously.
The practice of generating electronic tags or keywords by users rather than specialists as a way to classify and describe online content.
two types: broad and narrow. A broad folksonomy is the one in which multiple users tag particular content with a variety of terms from a variety of vocabularies, thus creating a greater amount of metadata for that content. A narrow folksonomy, on the other hand, occurs when a few users, primarily the content creator, tag an object with a limited number of terms.

http://en.wikipedia.org/wiki/Folksonomy

C.6.5 Describe how folksonomies and emergent social structures are changing the web.

AIM 8 Emerging technologies are modifying users' behaviour.

C.6.6 Explain why there needs to be a balance between expressivity and usability on the semantic web.

AIM 8 Emerging technologies are modifying users' behaviour.

C.6.7 Evaluate methods of searching for information on the web.

Teachers must address issues relating to searching for non-text based files/multimedia files such as using feature analysis.

C.6.8 Distinguish between ambient intelligence and collective intelligence.

C.6.9 Discuss how ambient intelligence can be used to support people.

Students will be expected to have researched examples such as biometrics, nanotechnologies.
AIM 9 Develop an appreciation of the possibilities that ambient intelligence provides in supporting people when carrying out routine tasks.

Ambient Intelligence describes an environment which is sensitive and responsive to human presence.
http://en.wikipedia.org/wiki/Ambient_intelligence

Biometrics

Identification of human characteristics by computer systems.
(Fingerprint, eye details, voice recognition, facial recognition)
http://www.webopedia.com/TERM/B/biometrics.html

Nano-technology

One nanometer is a billionth of a meter)(1 x 10 -9). This is the scale of nanotechnology.

Examples include
The manufacture of computer chips which  has the potential to launch a new generation of electronic devices that run faster, while using less energy, than those made from silicon chips

Nanotechnology engineers build first carbon nanotube computer [nanowerk]
More recent: physicists-build-nanomaterial-microchip-using-graphene [nanomagazine]
Also: http://en.wikipedia.org/wiki/Industrial_applications_of_nanotechnology#Consumer_goods

C.6.10 Explain how collective intelligence can be applied to complex issues.

MIT Center for Collective Intelligence

Students will be expected to have researched examples such as climate change, social bookmarking and stock market fluctuations.

The above are quite old links - DO your own research also.

AIM 5 Engender an awareness that effective collaboration and communication can resolve complex problems.
S/E, AIM 8 Emerging technologies are modifying users' behaviour.
TOK It is possible to have a collective intelligence greater than the sum of the contributors