Internet Business Daily

internet business : web trends : technology news

Who’s Mining Your Data?

Posted by Marshall Dunn On November - 21 - 2006

Advances in computer hardware and software have greatly simplified data-collection tasks, so one result is the rapid accumulation of massive amounts of information. But, just having databases full of idle information is useless unless it’s able to be exploited in some way. What commonly happens is the data is collected, but the familiar, well-trodden solutions are still the most often used by the business analyst despite sitting on piles of alternatives.

That’s probably where the retort for any well-documented yet still unresolved problem came from; the phrase itself, “Drowning in data but starving for information” is a good example of something that is so familiar and overused, no alternatives are even attempted. Similarly, business solutions can be like that: hackneyed. Using solutions learned in seminars outlining any of the dozens of problem solving methodologies out there may not necessarily cater to your biggest asset, your very own data. The problem is there’s just too freaking much of it. And organizing it is worst than organizing the garage. It’s more like organizing the garage and being expected to make a profit from what you find in there. One solution for dealing with this kind of information overload is called data mining.

Data mining involves the computer-assisted analysis of huge amounts of data, using advanced software tools, for the purpose of extracting hidden, qualitative relationships. Sounds like just querying the database? Not quite, even though the first step in a data mining process is collecting the data in an organized manner. But, a querying just allows you to just ask the database ‘such-in-such’ and let it send you the records. What differentiates data mining from querying is that querying can be effective in acquiring information that fits expected outcomes, but data mining techniques are more focused on uncovering hidden patterns and associations not previously known to the analyst. It’s the difference between asking for a list of everyone who lives on Main Street and requesting the address of anyone, anywhere who has a good chance of buying what you’re selling.

In a nutshell, using mathematical techniques to methodically sift through warehoused information, data mining helps in the recognition of significant trends or patterns that might not otherwise be apparent. Great…but if its so hidden, how can you be sure it’s even relevant? Well, that’s a whole other matter and up to you to decide. But you get the point: the process goes way beyond just the software.

And data mining is not new. Used by military and intelligence agencies, techniques long used in statistical analysis have been incorporated in recent years into a focused methodology featuring software with more user-friendly interfaces. Data mining has been used in diverse commercial applications, such as detecting credit card fraud, defining shopping patterns, analyzing equipment failure and profiling criminals.

It’s no secret that the new U.S. intelligence czar is further developing data-mining capability for gathering huge amounts of information in an effort to discern patterns that look like terrorist planning. The system will supposedly take all the data mining and modeling work done by various U.S. intelligence agencies and develop tools and algorithms to detect terrorist activities. There are currently privacy advocates criticizing a program called Tangram, currently being developed for the Office of the Director of National Intelligence. It is supposedly being tested without using any data about Americans…well, “OK, if you say so”.

The problem with data mining is that it’s not as much of a black-box, low user interaction tool many would want. With the diverse kinds of modeling you can use (neural networks, classification trees, link and nodal analysis, etc.) there are some decisions to be made before you even get started, so you don’t just get garbage out from all the garbage you put in. How do you decide? It depends… just what do you want to know? Tangram uses custom algorithms and a technique called link analysis, a method for linking entities with no overt association. Apparently, that’s for scouting terrorists, but what if you just want to know who buys widgets?

Data mining is a flexible tool so business oriented software may have quite a different focus requiring other methods. Still, in the end, many end-users may not feel as confident in navigating all the options of such a package. And that is also assuming the term ‘data mining’ isn’t as nebulous as the term ‘organic’ has become in the selection of produce. Relying solely on the term ‘data mining’ being in the product description doesn’t necessarily mean you are not just getting a supercharged query tool that’s simply adding more data to your warehouse faster. Sure, tools that collect data efficiently are good too, as long as you know what you’re getting. A good rule of thumb before buying a data mining package: know what data you have, know what data you need, and know what it is you want to find out. A little research will tell you what models are commonly associated with solving those type of problems, so make sure the package you’re buying at least has that as an analysis option. Or else take a shot in the dark and just do what the leaders in your own market are using.

Data mining has even found use in the area of matrimonial actions. E-mails are often automatically archived and can be easily retrieved by software without data mining. Since they can also contain very personal information that is relevant, revealing affairs and hidden assets, that knowledge would be desirable. Even illicit illegal activity can be retrieved and investigated for further and presented during the trail. Consequently, the side with the most data about the other party has the advantage. But the side who has the data and has the ability to mine for hidden associations which upon further investigation can be substantiated has a bigger advantage. The cost of retaining a data recovery specialist is supposedly less than that of than the retainer custody experts and forensic accountants. But, here’s the rub. Data recovery experts may be retained for longer periods based on complexity of the data mining operation.

Of course, the technique is not without its controversy. The Center for Digital Democracy is a nonprofit group involved in diversity in media and suggests that some large Web sites did not protect personal data from disclosure under certain circumstances. It’s not a revelation that many internet companies collect data and track visitors online. The Internet companies collect data from every click their visitors take to track what they’re doing online. This data can be used for Web site designs, targeted advertising campaigns and product introduction when properly analyzed via any number of data mining techniques. Microsoft’s adCenter advertising platform contains Web analytics, behavioral targeting, audience segmentation and data mining functions.

When a user clicks an ad delivered by Microsoft adCenter, a cookie is placed on the user’s computer for 30 minutes, which enables Microsoft and the advertiser to determine if a visitor clicked an ad delivered by Microsoft adCenter or visited the advertiser’s website within the 30 minute period. The user’s identity remains anonymous, and no personal information is collected or stored by Microsoft. Reputedly, the ad performance data helps determine the effectiveness of the ad but the question remains: what useful information can be mined from this data, anonymous or not? How exactly do you make such technological advances profitable? And how do consumers, who may be largely unaware of the extent they are tracked online, feel when they discover it?

The right balance between a retailers need to know and a consumer’s desire for privacy is a big issue. Just how much do I want the grocer to know about me? Or if I’m the grocer…is there any market for salmon in this demographics? While anything has the potential for misuse, simply having data mining ability doesn’t automatically imply the proper techniques are being used to analyze the collected data. Frequently, all the end user ends up with is a mass of mush…or the other extreme where the output just restates the obvious.

RSS feed | Trackback URI

Comments »

No comments yet.

Name (required)
E-mail (required - never shown publicly)
URI
Subscribe to comments via email
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> in your comment.

Trackback responses to this post