CISC 322 Assignment 4

From EQUIS Lab Wiki

Jump to: navigation, search

Contents

Architecture of a Search Engine: Architectural Styles

Due Friday, December 2, 2005 by 3:00 PM

Synopsis

In this assignment, you will modify the architecture of your search engine to apply the pipes and filters architectural style, and you will analyze the existing architecture to find examples of design patterns. This assignment will give you experience with applying architectural styles and design patterns. This assignment requires no implementation work.

Background

In the last assignment, you implemented and tested the your search engine, giving you a complete spider and search engine. In this assignment, you will modify and analyze the design of the spider engine.

Requirements Change

In real software projects, it is typical for the project's requirements to change as the project is underway. Our search engine project is no different, and so assignments two through four will each introduce a change in requirements. This assignment's change is to require the spider engine to be based on the pipes and filters architecture.

Pipes and Filters

We will implement pipes via a Pipe component, described below:

Pipe Interface
Operation Input Parameters Output Parameters Pre- Condition Post-Condition Exceptions Description
write val : String - - The string val is enqueued on the pipe, following any data already in the pipe PipeIOException: error writing to pipe The given string is added to the end of the pipe.
connect - connect : int - The filter is granted permission to read from the pipe, using the returned id. PipeConnection- Exception: error establishing a connection to the pipe A filter registers its desire to read from the pipe. An id is returned that is to be used to identify this filter when reading.
read id : int read : string The id must have been granted in a call to connect. The first string in the pipe not yet seen by this filter is returned, OR if there are no characters as yet unseen by this filter, null is returned. PipeIOException: error reading from pipe

IllegalFilterId- Exception: This id was not granted in a call to connect.

The first character from the pipe is dequeued and returned; if the pipe is empty, null is returned.

As shown in this description, pipes allow data to be conveyed character by character (as represented by a string of length 1.) Within the pipe, the data is buffered, so that data can be read from the pipe asynchronously. The two operations write and read allow data to be written to and read from the pipe. Before reading from a pipe, a filter must first connect to the pipe. The filter will receive a unique id which must be used in all calls to the read operation. This ensures that each filter sees all data flowing through the pipe. Filters are simply components that are connected via pipes.

Using Pipes and Filters in the Spider Engine

The Page Info component is to be rearchitected as a subsystem whose internal components are organized using the pipes and filters architectural style. Filters should be used for the following tasks:

  • Strip the HTML tags from the given HTML document
  • Split the document into words
  • Remove all duplicate words
  • Remove all common words
  • Strip everything from the document other than anchor (A) tags
  • Strip target URL's from anchor tags
  • Remove duplicate anchor tags

The interface to the Page Info component should be a single method that takes a SafeURL as parameter and returns two sets, one of words, the other of URL's.

To Do

Part 1: Provide an architecture diagram showing the new Page Info subsystem, rearchitected to use pipes and filters. Support your architectural diagram with english text to explain its purpose and organization. Compare the new organization to your previous one in terms of its maintainability. (Provide an architectural diagram of your previous Page Info component for comparison.)

Provide a sequence diagram showing the operation of your subsystem when your getInfo method is called with a URL pointing to a page containing the following text:

<b>hello</b> hello

Part 2: In both your spider engine and query engine, your Word Database component calls the Persistent Database component. Explain how this is an example of the adapter design pattern. Illustrate your explanation with a class diagram. Be sure to clarify whether the class or object adapter pattern is being used.

Part 3: In your spider engine, the Page Info component is implemented via a public interface as well as a number of internal classes. Explain how the Page Info component exemplifies the facade design pattern. Illustrate your explanation with a class diagram.

Part 4: Provide a logical specification of the interface of the Word Database component. Assume that the component's interface consists of the two methods:

void addAssociation( String word, String url )
Set<String> getURLs( String word )

Be sure to model the state of your component using mathematical types only. Do not use Java types. Do not use the Persistent Dictionary component as part of your specification; instead assume the word/url associations are stored within Word Database.

The Java Set type is used as the return type of getURLs. You may model this using the mathematial set P(String).

To Hand In

Submit a report including parts 1 through 4. The report should have a table of contents, and be divided into one section per part, with appropriate section headings. The front page of the report must be the assignment 4 cover page, available here. (Two marks will be deducted for not providing this cover sheet.) Be sure to read this cover sheet for a breakdown of how the assignment will be graded. In your report, include any necessary supporting documentation necessary to explain your designs. All architectural diagrams must be typeset using Poseidon.

This assignment is to be submitted to the CISC 322 drop box on the second floor of Goodwin Hall by 3:00 PM on Friday, December 2, 2005. Printers tend to be busy shortly before assignment deadlines, so it is wise to avoid last-minute submission. No late assignments will be accepted.