On Saturday I had lunch with a friend of mine who works in software R & D.
One of his current projects is a system that allows computers to analyse blogs. In breaking down the blog posts and their attendant comment threads, the software will eventually be able to accomplish some very impressive feats.
For example, by looking at how often a commenter comments, and on what topics, and how their comments relate to the posts and the other comments, the software will be able to guess whether or not the blogger and the commenter know each other in real life, as well as online. That may not sound particularly useful, but once it identifies the difference, the software will be able to map your physical and electronic relationships and their overlaps. Unlike MySpace, which can only tell that you have 600 friends, this software will be able to analyse how many of those 600 you actually have a real relationship with, and even on what basis (electronic, physical, professional) that relationship exists.
The software does this by building an extraordinarily detailed image of each individual, based on language use and topics of interest. In terms of language, do I use emoticons? Do I use words like “basically” or “literally” a lot? Do I rely a little too heavily on ellipses, or do I always misspell certain words? In terms of topics, do my posts tend to contain certain words, like “scooter”, and never others, like “highchair”. Does my username frequently appear in comment threads that contain multiple instances of words like “jihad” or “lolcats” or “MST3K”?
Once the software has built an image of my language style and my buzz topics, it can compare that image to any given piece of text and calculate the likelihood of me being the author. A post of short, badly punctuated sentences bemoaning the lack of affordable childcare will score low. A post of long sentences full of parentheses about 'Zontar: The Thing From Venus' will score high.
The practical upshots are dual.
Firstly the software will be able to identify clusters or communities of bloggers, even if those bloggers don’t realise that they’re part of a cluster or a community. I will be able to ask it to find bloggers in my area who share my interests and lexicon, and it will identify them for me. It will look at the things about which I’ve written, my geographical location, the people I link to and the people they link to, and generate a list of possible correlations… neatly sidestepping dead blogs, spam blogs, subliterate LiveJournal entries, or any blog that contains the phrase “the wisdom of Kerry Nettle
”. It becomes a kind of social networking tool, except that instead of laboriously filling out pages upon page of my details, everything I’ve ever written becomes my details. And it tracks all bloggers, not just those who have signed up to it, so it’s working from the largest conceivable dataset.
Secondly, the software will be able to identify any individual via the quirks of their writing style and the subjects they write about, even if they use different pseudonyms and computers with different IP addresses. All of a sudden anonymous trolling and sock puppetry become a lot more difficult.
The more perceptive reader will already be able to tell that I’m not exactly thrilled at some of the implications of this analysis. Put crudely, while the final version of this software will be used primarily to map individuals within limited communities, its technological descendants could potentially fingerprint every single person on the internet, so intimately and thoroughly that tracking an individual’s movements through cyberspace becomes a piece of cake. Next to this rather insidious technology, the data-mining of Facebook looks positively one-dimensional. We will all be tagged and tracked like migratory birds.
As a result, swapping between identities, or changing your identity, will eventually be just as hard on the internet as it is in real life. You might think that this serves people right for being clandestine or sneaky or hypocritical, but imagine if there was software that could upon request tie together your formal politically-themed blog, that one drunken rant about fat chicks you wrote to a chat room in 1998, your online CV, the fan fiction you wrote as a teenager, the anonymous venting about your spouse you wrote to truewifeconfessions.blogspot.com
, your old Lavalife personal ad, and so on and so forth. At the touch of a button anyone in the world can know about every single ill-considered remark, angry flare-up, ignorant position and superseded opinion you’ve ever expressed… and then, of course, use it against you. And if you’re like me - a big-mouthed idiot who can’t go a day without saying something offensive - it’s a cause for concern.
To invert the famous New Yorker cartoon
, on the internet soon everyone will know you’re a dog.