Data mining and surveillance in the post-9.11 environment

18 pages
27 views
of 18
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Share
Description
Data mining and surveillance in the post-9.11 environment
Tags
Transcript
  IAMCR Data Mining: Gandy 7/11/2002 1 Data mining and surveillance in the post-9.11 environment Oscar H. Gandy, Jr. Herbert I. Schiller Professor Annenberg School for Communication University of Pennsylvania For presentation to the Political Economy Section, IAMCR Barcelona, July, 2002  IAMCR Data Mining: Gandy 7/11/2002 2 Introduction In his wildly successful book on the future of cyberspace 1 , Lawrence Lessig responded to a general challenge to privacy activists: tell us what is different about surveillance in the computer age. Lessig suggests that the difference is to be seen in the ease with which the data generated from the routine monitoring of our behavior can be easily stored, and then searched at some point in the future. That is, because more and more of our daily life involves interactions and transactions that generate electronic records, our lives become fixed in media that can be examined and reviewed at will. Lessig and others who are concerned about threats to privacy 2  have identified the countless ways in which our behavior in public places, as well as in the  privacy of our homes, generates records that reside in the computers of corporations and government agencies. While a sales clerk in the local store might take note of your interest in different pieces of  jewelry, or clothing as you make your way from counter to counter, their monitoring does not generate a searchable record of each of your visits to the store. Indeed, unless they are security guards, and you are looking particularly suspicious that day, they don’t follow you from floor to floor. It is only when you purchase those socks or gloves that a searchable record is made. However, the generation of transaction records in cyberspace is many times more extensive than it is in the world of bricks and mortar. Web servers generate a record each time a visitor clicks on a banner ad, or follows a link in order to learn more about some commodity or service. In addition, because of the ways in which Web technology facilitates the linkage of records, the click streams, or mouse droppings that you leave behind as you browse around much of the web, makes it easy for marketing service providers like Double Click to develop a cumulative record. 3  Because Double Click manages the serving of ads for  IAMCR Data Mining: Gandy 7/11/2002 3 several thousand publishers on the web, your profile may contain information about a  broad range of goods and services about which you have revealed some interest. Because the cost of storing data in electronic form continues to drop (one theoretical estimate, based on what engineers refer to as Moore’s law, suggests that the cost of storage drops by 50 % every 18 months), 4  there is less of an incentive for organizations to discard any  transaction-generated information. The only problem that businesses and government agencies face is how to make sense of these growing mountains of data. 5  Enter the mathematical wizards who brought us both the bell curve and the ballistic missile, and voila! we have the science of data mining, or as the specialists would prefer, the science of Knowledge Discovery in Databases or KDD. Data mining, as a tool for the discovery of meaningful patterns in data is the product of some rapidly developing techniques in the field of applied statistical analysis. Of  particular importance for those of us who are concerned about the implications of data mining for individual and collective privacy, is the fact that data mining software,  products, and services are being introduced into the marketplace by a number of competing vendors. The increasing sophistication of the software packages, and the rapidly declining prices for custom, and off-the-shelf data mining products means that the techniques will soon be in widespread use. In addition, the government’s heightened concern with security following the events of 9.11 means that an infusion of tax dollars for research and development is likely to attract a swarm of competitors. We can expect this activity to support an even more rapid expansion in the capacity of these systems to produce strategic “intelligence” from what would ordinarily be meaningless bits of data stored in computers all around the globe. What I would like to do in this paper is provide a thumbnail sketch of data mining as a technology, identify some of the leading firms and the nature of their data mining  IAMCR Data Mining: Gandy 7/11/2002 4  products, and then identify some of the things that trouble privacy advocates, and others who are concerned about civil liberties. Data Mining As I have suggested, data mining is an applied statistical technique. The goal of any data mining exercise is the extraction  of meaningful intelligence, or knowledge from the  patterns that emerge within a database after it has been cleaned, sorted and processed. The routines that are part of the data mining effort are in some ways, similar to the techniques that are used to extract precious minerals from the soil. However, whereas the extraction of precious metals is often labor intensive, and represents risks to both workers and the environment, the extraction of intelligence from computer databases is increasingly being automated, in ways that reduce the direct risks to labor, while amplifying the risks to society in general. Indeed, as I hope to demonstrate, the impact of this technology on the social environment in the long run may be as destructive as strip mining. Imagine if you can, the mountains of transactional data that are generated each time each of us purchases commodities that are marked with universal product codes. When we use credit or check verification cards, or any of a number of retail vendors’ discount cards, individually identifiable information is captured and linked with those purchases. There is little wonder that global retail chains, like Wal-Mart have invested substantial resources in the development of data warehouses to manage the details and extract the value in the terabits of data being generated each day throughout their networks. 6  Our interactions with government agencies, as well as with the component parts of the massive health care system, also generate detailed records. However, because these data are not gathered in standard forms with classification schemes akin to the UPC code, there are tremendous pressures within these industries to move toward greater standardization and comparability across transactions. 7    IAMCR Data Mining: Gandy 7/11/2002 5 Although progress is being made somewhat more slowly in translating voice messages into text for automated processing, no such barriers exist for classifying e-mail text, or the posts that are made to newsgroups. The textual components of web pages are also relatively easy to classify and describe, although the graphics on those pages still represent something of a problem for developers. Even more problematic in terms of the need to develop common codes and classification standards, is the digitized output of surveillance cameras. However, it seems likely that the rate of success in developing classification techniques will increase substantially in response to research and development initiatives rushed through the legislature in response to the events of 9.11. The goals of data mining In general, data mining efforts are directed toward the generation of rules  for the classification of objects. These objects might be people who are assigned to particular classes or categories, such as “that group of folks who tend to make impulse buys from those displays near the check out counters at the supermarket.” The generation of rules may also be focused on discriminating, or distinguishing between two related, but meaningfully distinct classes, such as “those folks who nearly always use coupons,” and “those who tend to pay full price.” Among the most common forms of analysis are those that seek to discover the associative rules  that further differentiate between clients and customers. For example, video rental stores seem to be interested in discovering what sorts of movies tend to be rented together, and what sorts of movies tend to be associated with the sale of microwave popcorn or candy. Designers are especially interested in being able to predict the response of individuals to different offers or appeals.   In an attempt to develop reliable sorting tools, data miners seek to discover patterns of association between demographic characteristics and commercial behaviors . Discriminant analyses  are especially
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks