The essential point of data analysis is just data reduction -- gleaming some useful information out of a vast sea of data, most of which -- as prabhakar said, is just "junk", where junk is defined as data that is not relevant to my immediate query or interest.
As the volume of data continues to grow exponentially, the need for ever more advanced algorithms for data mining and analysis increases. The amount of useful information is increasing, yes, but I think the amount of junk data is increasing much more rapidly.
While I "kind of" agree with your statement...things can be made to appear to make sense to us even though they are simply clever algorithms. This is the "artificial" in aritificial intelligence.
Kurzweil had some interesting comments tangential to this recently:
-- "As we go through this decade, search engines aren't going to wait to be asked. They'll be listening [to humans] in the background. And [the search results] will just pop up."
Algorithms never will 'make sense' - that's because making sense is a creative act and algorithms only process information they don't create meaning. So far only biological systems (which are by nature self-organizing, not programmed systems) are the ones making any sense.
What is wrong if the way one looks at the data is biased. That is the real purpose. IF I have a purpose in mind for which I am analyzing some data , then I am looking at that data with reference to my query only. I really do not want to see other patterns not anyway related to the query in my hand.
Data mining for nothing may be a past time for some researchers like those who are trying see some sense from some obscure signals coming from the outer space hoping to find some logic or some pattern.
In my opinion, more the data in those data centers , more of it is likely to be junk - even the creators of the data will seldom bother have a second look at it while the server software will make every attempt to preserve it, secure it, mirror it for years for the sake of nothing.
The problem we have to really solve here is how to detect that a particular data is defunct and hence can be safely destroyed , making free that valuable space, the efforts required to preserve it and so on.
I found very little useful here. I am reminded of two things. The first is that a fundamental problem of Artificial Intelligence is that the Universe of Discourse has to be constrained or the so-called AI is essentially useless.
The second thing comes from a newsletter from the president of the World Future Society:
"Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?"
(Attributed to Nietsche).
The Pres of WFS then continued:
"Where is the information we have lost in data?"
"Where is the data we have lost in computers?"
(Please stop and chortle here if you wish. Oh, yes, Chortle is kind of LOLS: Laugh Out Loud, Suppressed.)
I found that the 4 questions went from the profundity of what the lady is hoping for to the absurdity of what most people such as her have actually done.
Of particular note: In AI and Computability theory, we get into deep trouble whenever we work with self-referential systems. It has been ever thus since Russell pointed out the problem with the Set of All Sets That Do Not Contain Themselves As Subsets.
In some sense, I hope I am wrong and the lady and the folks at the Singularity Institute can get ahead of the computers before they get us.
Great interview. Lot's of interesting things to think about here. Shades of the self aware computer from the "Ender's Game" series - Jane - here. Maybe when super advanced algorithms for self organizing data come about in earnest...a self aware entity will emerge and we can name it "Genevieve." I can see a day coming when the conclusions reached by advanced algorithms, relative to vast amounts of data being analyzed, not being able to be understood in human terms and only being useful to our silicon children - ostensibly for human good. Its kind of an eerie thought.