A Self-Organizing Cyberspace Using Human-Generated Hop Data

David Wallace Croft
Senior Intelligent Systems Engineer
Special Projects Division, Information Technology
Analytic Services, Inc. (ANSER)
croftd@nexos.anser.org

1997-08-31


Task Objective

The objective of this proposal is to create a 3-D visual map of Internet resources, a cyberspace, that is self-organized by the traffic patterns of users over time to reveal the natural relationships between those data sources. We propose this technology as an innovative alternative to the common methods of data retrieval from web sites which generally rely upon the use of a software agents to seek, load, parse, and score textual data sources. By simply sampling the HyperText Transfer Protocol (HTTP)1 client-server request header information2 generated by web browsers over time, we can portray an intuitive conceptual mapping of related Internet data sources as evaluated by the most advanced form of neural network technology available today, human wetware.

cyberspace3

This term frequently means "everyone and everything on the Internet." Among the term's suggested connotations are that the Internet is a new and revolutionary kind of community, that it is spatial in a new and extendible sense, and (perhaps a little less consciously) that Internet participants are on some kind of journey into the unknown. The term was coined by William Gibson in his novel of 1984, Neuromancer4.

We envision our prototype cyberspace as a 3-D space populated by spherical nodes representing web sites, web pages, and other Internet resources accessible by HTTP. Lines connecting nodes will indicate that unidirectional or bi-directional traffic commonly exists between the data sources. Note that these lines may not only indicate the presence of an HyperText Markup Language (HTML) hyperlink5, but also other information that may be present in a web page that would suggest to a human researcher that the destination is a viable source of desired information.

The distance between the spherical nodes will rapidly convey a natural, intuitive, and visual representation of the relationships between the Internet resources. Data clustering6 is a well-known technology used in statistics and neural networks to find non-obvious patterns and relationships in data. In our proposed cyberspace, we envision that the user will be able to identify sources of interest by simply seeing and gravitating toward a dense cluster of related nodes. As the density mapping is generated by human traffic, it is our expectation that non-obvious and hidden relationships between Internet resources will be revealed that would normally evade artificial intelligent agents. This would include resources where no reliable textual information is available to the general public such as web sites with a misleading facade, password protections, and information presented in media that necessitate the advanced pattern recognition and comprehensional capabilities of human intelligence such as images, audio, video, and query-response interactivity on the level of a Turing Test.

The proposed technology will provide a highly efficient alternative to the current methods of seeking, loading, parsing, and scoring of information for human consumption. At this time, a vast number of web spiders, crawlers, bots, and intelligent software agents are busily consuming Internet bandwidth and computational resources on our behalf7. Our proposed method obviates the loading and processing of the complete contents of all reachable Internet resources by simply sampling the request headers generated by humans visiting active sites.

The application of the samples of movements borrows a principle from the research of biologically-inspired unsupervised neural network learning algorithms: an overall self-organizing form and structure arises from many local interactions over time8. With each hop of a user from web site to web site, an opportunity exists to collect a datum, apply it in a fashion to cause the nodes to slide toward and away from each other incrementally, discard the datum, and see hitherto unknown relationships between the resources reveal themselves over time on the macro and micro levels. The presentation of these relationships in this form may prove to be more natural and intuitive to humans than current methods of textual parsing, indexing, and sorting by artificial means. If this is the case, the demonstration of a functioning prototype may induce developers to move away from the release of hosts of content-hungry software agents upon the Internet.

Technical Summary

It is the convergence of a number of developing technologies that makes the application of the cyberspace concept, which has generated legendary excitement among researchers and science-fiction fans for over a decade, feasible today for practical applications in research. These technologies include neural networks, the Internet, virtual reality, intelligent software agents, the World-Wide Web (WWW), and the Java programming language. The last technology mentioned, the Java programming language, is the capstone.

The richness of the software libraries included as standard components of the Java programming language brings development of long-term research and development projects into the realm of the immediately viable. This infrastructure technology includes application programming interfaces (APIs) that facilitate Internet communications, virtual reality representation, high-performance vector processing, and agent technology support. The makers of Java further provide a web server API and an extensible Java web browser9. It is our expectation that the bulk of our efforts will not be in raw research and infrastructure generation but rather in the expert integration and application of the diverse and comprehensive Java APIs and tools.

While there have been a number of other research efforts into the mapping of a self-organizing cyberspace since the boom of the WWW, the approach of using human-generated "hop data" is unique and innovative. Prior, current, and emerging state-of-the-art techniques in cyberspace mapping includes using geographical (land-based) point-of-origin of request data10, following web pages to find hyperlink references and then pre-loading referenced links11, machine parsing of textual content for associative clustering12, 13, and the use of additional embedded descriptive HTML tags ("meta-content")14. Our approach is unique in that it clusters related information sources over the entire span of multimedia using a metric derived from, and intuitive to, human users. The data that is sampled and utilized, Uniform Resource Locator (URL) source and destination pairs or "hop data", is a readily-accessible by-product generated by HTTP request packets. Without the need to download, parse, or analyze URL content, which includes text, images, audio, video, and interactive forms, non-obvious or hidden relationships between information sources become readily apparent for consumption by researchers, analysts, and intelligencers.

Analytic Services Inc. (ANSER) is currently pursuing a related IR&D effort to develop intelligent software agents (ISAs) for applications in national intelligence and law enforcement. To this date, prototype Internet agents have been demonstrated with capabilities including web search, information filtering, off-line delivery, and notification15. The expertise of ANSER includes graphics, client-server communciations, mobile agents, neural networks, fuzzy logic, genetic algorithms, and expert systems, technologies that we have primarily prototyped using Java. We are currently developing a reusable object-oriented Java codebase for agent technology and communications with standard Internet protocols including HTTP (web), SMTP and POP3 (e-mail), NNTP (newsgroups), FTP (file transfers), JDBC (SQL database connectivity), and popular mailing list servers (majordomo, listserv)16. ANSER regularly contracts with government agencies to provides analytic services in areas vital to national interests. It is our intent to provide the most advanced information management tools possible to the analysts of ANSER and its customers to be employed in settings that require maximum confidentiality and control.

References

  1. "IETF Hypertext Transfer Protocol (HTTP) Working Group"
    http://www.ics.uci.edu/pub/ietf/http/
  2. "Hypertext Transfer Protocol - HTTP/1.0", section 10.13 "Referer"
    http://www.ics.uci.edu/pub/ietf/http/rfc1945.html#Referer
  3. whatis.com Inc., 1997, "What Is...cyberspace (a definition)"
    http://www.whatis.com/cyberspa.htm
  4. Gibson, W., Neuromancer, Ace Books, 1984.
    http://www.putnam.com/putnam/books/neuromancer/book.html
  5. "The World Wide Web Consortium"
    http://www.w3.org/
  6. Fayyad, U., et al, eds., Advances in Knowledge Discovery and Data Mining, AAAI Press, 1996.
    http://www.aaai.org/Press/Books/Fayyad/fayyad.html
  7. Caglayan, A. and Harrison, C., Agent Sourcebook: A Complete Guide to Desktop, Internet, and Intranet Agents, Wiley Computer Publishing, 1997, Ch. 3 "Internet Agents".
    http://www.opensesame.com/agents/index.html
  8. Haykin, S., Neural Networks: A Comprehensive Foundation, Macmillan College Publishing Co., 1994, Ch. 9 "Self-Organizing Systems I: Hebbian Learning".
    http://sparky.mcmaster.ca/People/Faculty/Haykin/books.html
  9. Sun Microsystems Inc., [Java] "Products & APIs"
    http://www.javasoft.com/products/index.html
  10. Lamm, S. and Reed, D., "Real-Time Geographic Visualization of World Wide Web Traffic", 1995
    http://www-pablo.cs.uiuc.edu/Projects/Mosaic/WWW3/#avatar
  11. Advanced Interaction Group, University of Birmingham, "Hyperspace", 1995-1996
    http://www.cs.bham.ac.uk/~amw/hyperspace/
  12. Neural Networks Research Centre, Helsinki University of Technology, "WEBSOM: A novel SOM-based [Self-Organizing Map] approach to free-text mining", 1996-1997
    http://websom.hut.fi/websom/
  13. Honkela, T., et al., "Self-Organizing Maps of Document Collections", ALMA, Issue 2, 1996
    http://www.diemme.it/~luigi/websom.html
  14. Perspecta Inc., "Perspecta Announces Breakthrough Way to Experience and Understand Information on Intranets and the Web: Revolutionary Visual Experience Gives Users Insight Into Information", 1997
    http://www.perspecta.com/whatsnew/releases/pr_6_30_97.html
  15. Analytic Services Inc., "ANSER Software Agents"
    http://nexos.anser.org:8080/java/app/
  16. Analytic Services Inc., "ANSER Java Codebase"
    http://nexos.anser.org:8080/java/


Back

http://www.alumni.caltech.edu/~croft/research/internet/cyberspace/
David Wallace Croft, croftd@nexos.anser.org
© 1997 Analytic Services, Inc. (ANSER)
Transcribed 1997-10-14.