CISC 322 Assignment 3

From EQUIS Lab Wiki

Jump to: navigation, search


Contents

Architecture of a Search Engine: Architectural Analysis

Due Monday, November 14, 2005 by 3:00 PM

Synopsis

In this assignment, you will extend the architecture of your search engine to include a web-based user interface. You will analyze, implement and test this design. This assignment will give you experience with web-based architectures and with analysis of architectural properties. The assignment will also expose you to the typical problem of requirements change in an ongoing project.

Background

In the last assignment, you designed the core Query Engine component of your search engine. In this assignment, you will complete your design of the search engine and implement it.

The search engine is to present a search page allowing the user to enter a query in the syntax specified in assignment 2. The search page should include a field where the user can enter a query, and a button allowing them the launch the search.

The result of the query is a result page, which should list up to ten web pages which matched the given query. Each returned entry should show the URL of the matched page; the URL should be a clickable link to the page itself.

You are otherwise completely free in the graphic design and wording of your search and result pages. I'll look forward to seeing what you produce!

Requirements Change

In real software projects, it is typical for the project's requirements to change as the project is underway. Our search engine project is no different, and so assignments two through four will each introduce a change in requirements. This assignment's change is to require the search engine to ignore very common words such as "the" and "and". Such words appear so frequently (on almost every web page!) that they do not contribute usefully to the web search.

The list of words to ignore is: a, and, the, in, of, to, it.

To Do

Part 1: Modify the architecture of your system to satisfy the new requirement of ignoring common words. Using whatever supporting documentation you choose, explain what you changed and why. Argue why the changes you chose to make are the best way of addressing the new requirement in the architecture. Changes may be in the Spider Engine, the Search Engine, or both.

Part 2: Create a calls-perspective architectural diagram for your Search Engine. Be sure to include nodes. Your diagram does not need to contain the Spider Engine. Your architecture should include the Word Database component from assignment 1 and the Query Engine component from assignment 2.

Part 3: Create tabular specifications of all components in your architecture. You do not need to create a specification for the Persistent Dictionary component.

Part 4: Implement and test the Search Engine based on your architecture. Your search engine should be compatible with your Spider Engine that you completed in assignment 2.

Part 5 (approximately 1 page): Identify "off the shelf" components that you have used in your architecture, such as web servers, web browsers, etc. Discuss the pros and cons of structuring this application around these off the shelf components. (Hint: you should consider the persistent dictionary component as off the shelf; you do not need to consider the MySQL database, since you are not using it directly.)

Part 6 (approximately 2-3 pages): Provide three plausible modification scenarios for the spider/search engines. Evaluate the impact of these changes on your architecture. Provide appropriate documentation (e.g., architectural diagrams) to structure your discussion.

Part 7 (approximately 1-2 pages): Discuss how you would rearchitect the Search Engine to improve its availability. Provide an updated architectural diagram showing your new, high-availability design. You do not need to implement this new design.

To Hand In

Submit a report including parts 1 through 7. The report should have a table of contents, and be divided into one section per part, with appropriate section headings. The front page of the report must be the assignment 3 cover page, available here. (Two marks will be deducted for not providing this cover sheet.) Be sure to read this cover sheet for a breakdown of how the assignment will be graded.

In addition to what is specified above, be sure to include code for the Search Engine, and test runs showing its correct operation. You do not need to submit the code for the Spider Engine.

This assignment is to be submitted to the CISC 322 drop box on the second floor of Goodwin Hall by 3:00 PM on Monday, November 14, 2005. Printers tend to be busy shortly before assignment deadlines, so it is wise to avoid last-minute submission. No late assignments will be accepted.