Thursday, November 6, 2008

Week 10

Muddiest Point
- What exactly makes XML better? I see why we have CSS to replace HTML, but what benefits does XML bring to the table? Is it easier to debug? Is it just the newest thing that we're all supposed to like because it's new and shiny?

Web Search Engines: Part I
- Looks at basic search protocol for bots searching the internet - particularly relating to etiquette and the vast scope of the information spiders look for. (ignoring duplicate material, how a site is chosen to be crawled and how it gets to the head of the line, cloaking: providing different info to bots than to people visiting the page)

Web Search Engines: Part II
- This article looks at how search engines index what they've found. There are millions of words that the indexers have to go through. Also explains how search terms are related to what pages are returned and in what order.

The Deep Web
- Apparently this is actually advertising material...interesting. But basically the "deep web" are all those webpages which are dynamically created (as a result of a search) and therefore are unavailable to index with bots. When websites were just files it was easier, but newer technology has changed that. While the deep web's info greatly exceeds the surface web - but much of that data comes from places like NOAA, NASA, and the like - the type of information a normal search is not looking for.

Current Developments...
- Looks at the Open Archives initiative which attempts to share metadata from a variety of sources (sheet music was one project, another looked at resources regarding the american south).

No comments: