John Romant's Technology Blog

If it's technology, I want to know about it.

Category Archives: Data Mine

Infographic: How often do criminals use social media technology?

Are criminals using technology like Facebook, Twitter, Google Street View and/or Foursquare to help them commit their crimes?   Click here to visit original post.


What is the Average StumbleUpon User Like based on Analytics?

So far this site has received 89,309 visitors from Stumbleupon. This is a pretty big sample that enables me to find some commonalities among users and broadly profile the average StumbleUpon user. I’ve used Google Analytics as my tool of choice for getting the data.

Where are StumbleUpon users from?

StumbleUpon has more international audience compared to sites like Digg (where 90%+ of people are from Northern America.) It’s different for Stumbleupon:

what region are stumbleupon users from

For Americas, most of them are from US. But what about Europe?


I didn’t expect Northern Europe will be that high on the list.

What browser do Stumbleupon users use?

The data gives a clear picture:


Internet Explorer = a miserable failure here. There’s a Stumbleupon toolbar for IE, yet almost nobody uses it. Chrome is gaining in popularity. Firefox is the king on the throne. 93%!!!

Do Stumblers use the latest Flash version?


Yes. What about Java, do they have Java support?

java support stumbleupon

Oh well, not really. Another explanation is that many of them might have NoScript installed which is blocking JavaScript.

What operating system do Stumbleupon users have?


The surprising part to me is about mobile. There was 1 visits each from iPhone and iPad. 2 visitors in total from a sample of 89.000 users! Seems StumbleUpon needs to make it more convenient for its users to use the toolbar on their mobile phones.

Stumbleupon + Safari on the Mac?


Uh, Safari = miserable failure (with IE.) There are some tricks to get StumbleUpon running on Safari, though. Still, not many people are willing to use Safari for stumbling on Mac.

Are most Stumblers using widescreen monitors?

To answer this question, we need to take a look at the screen resolution (certain screen resolutions like 1280×800 are not used on full screen monitors):


From this data, we can estimate that around 72% of Stumblers use widescreen monitor compared to 18% who use full screen monitors. I didn’t count the last one (1024×600) which is mostly used on laptops and netbooks.

How old are Stumbleupon users?

I’ve used Quantcast (which is pretty accurate) to get this data:


Be aware this data is only for United States. It shouldn’t be very different for other countries, though, considering the fact most sites attract similar audiences (especially for sex and the age group.)

So, based on all of the data above, we can now make a profile of the average Stumbleupon user:


Is World War III Going To Be Started via Cyber Warfare? Pentagon is Prepairing.

The Pentagon, headquarters of the United State...

Image via Wikipedia

By Anna Mulrine,
Staff writer, CSMonitor

(AXcess News) Washington – The Pentagon is rapidly preparing for cyberwar in the face of alarming and growing threats, say senior defense officials, who add that sophisticated attacks have prompted them to take the striking step of investigating the feasibility of expanding NATO‘s collective defense tenet to include cyberspace.

But as such planning intensifies, the military is struggling with some basics of warfare – including how to define exactly what, for starters, constitutes an attack, and what level of cyberattack warrants a cyber-reprisal.

“I mean, clearly if you take down significant portions of our economy we would probably consider that an attack,” William Lynn, the deputy secretary of defense, said recently. “But an intrusion stealing data, on the other hand, probably isn’t an attack. And there are [an] enormous number of steps in between those two.”

Today, one of the challenges facing Pentagon strategists is “deciding at what threshold do you consider something an attack,” Mr. Lynn said. “I think the policy community both inside and outside the government is wrestling with that, and I don’t think we’ve wrestled it to the ground yet.”

Equally tricky, defense officials say, is how to pinpoint who is doing the attacking. And this raises further complications that go to the heart of the Pentagon’s mission. “If you don’t know who to attribute an attack to, you can’t retaliate against that attack,” noted Lynn in a recent discussion at the Council on Foreign Relations.

As a result, “You can’t deter through punishment, you can’t deter by retaliating against the attack.” He lamented the complexities that make cyberwar so different from, say, “nuclear missiles, which of course come with a return address.”

How to pinpoint the source of a cyberattack is a subject being discussed by Pentagon officials with their counterparts in Britain, Canada, and Australia, among others, in advance of the upcoming NATO summit in Lisbon in November, at which cyberwarfare is an item on the agenda. Officials from NATO member states are also discussing such fundamental issues as how to share information and exchange related technologies, illustrating that the concept of a collective cyberwarfare defense is still in its infancy.

Lynn is among those working to develop the Pentagon’s new cyberstrategy, which is focusing both on how to defend the military’s classified networks as well as how to protect the Internet itself.

This upending of some key tenets of military doctrine is prompting the Pentagon to look to some surprising new places for strategic models of cyberdefense, including public health. “A public health model has some interesting applications,” Lynn said. “Can we use the kinds of techniques we use to prevent diseases? Could those be applied to the Internet?”

To that end, the Pentagon is now researching means of introducing internal defenses to the Internet so that it acts more like a human organism. When it’s hit with a virus, for example, it might mutate to fend it off. Such strategies are meant to “shift the advantage much more to the defender and away from the attacker,” Lynn said.

The problem is that the Internet currently has very few natural defenses. And sophisticated crafted viruses like Stuxnet are even tougher to fend off. Indeed, the Web “was not developed with security in mind,” he added. “It was developed with transparency in mind; it was developed with ease of technological innovation.” Those same attributes do not lend themselves to good security. Among the potential targets for cyberattack frequently mentioned by cybersecurity experts are the nation’s powergrid and financial system.

It was in 2008 that a cyberattack on Pentagon networks – an attack attributed to an unnamed “foreign intelligence service” – served as a wake-up call for US defense leadership. “To that point, we did not think our classified networks could be penetrated, so it was – it was a fairly shocking development,” said Lynn, adding that it was a “seminal moment” in a new military frontier.

Lynn put forward an analogy to early American warfare that the Pentagon often likes to call upon to illustrate its point. “If you figure the Internet is 20, 20-plus years old, and you kind of analogize to aviation … the first military aircraft was bought, I think, in 1908, somewhere around there. So we’re in about 1928,” he said.

“We’ve kind of seen some … biplanes shoot at each other over France,” he added. “But we haven’t really seen kind of what a true cyberconflict is going to look like.”

He warned, however, that there were a few things that appear clear. It is a kind of war that “is going to be … more sophisticated, it’s going to be more damaging, it’s going to be more threatening” than it appears at the present, Lynn said. “And it’s one of the reasons we’re trying to get our arms around the strategy in front of this rather than respond to the event.”

Have you checked your Facebook PhoneBook yet?

Image representing Facebook as depicted in Cru...

Image via CrunchBase

Facebook has shamefully added all of our friends phone numbers for everyone to see.  Was this a mistake or the evolution of facebook?  See it for yourself, go to the top right of your screen, click “Account” the “Edit Friends”.  On the upper left side of your screen is the “Phone Book”. Everyone’s phone numbers are now being published.  You need to manually change your privacy settings to fix this problem.
Unless Facebook changes their ways, you better get very familiar with the user interface on the facebook privacy page. to search, identify smear tactics, Twitter-bombs through November election runup

Sept. 28, 2010

BLOOMINGTON, Ind. — Astroturfers, Twitter-bombers and smear campaigners need beware this election season as a group of leading Indiana University information and computer scientists today unleashed, a sophisticated new Twitter-based research tool that combines data mining, social network analysis and crowdsourcing to uncover deceptive tactics and misinformation leading up to the Nov. 2 elections.

Lady Gaga tweetPictured here is a diffusion network created by for the Twitter burst generated by Lady Gaga supporters toward John McCain following Gaga’s comments about McCain’s opposition to repealing Don’t Ask, Don’t Tell. The meme was tweeted 1,276 times by 1,100 users, with 168 users retweeting 696 times and another 59 users mentioning the meme 325 times. Links in orange are mentions; blue links show retweets. 

Combing through thousands of tweets per hour in search of political keywords, the team based out of IU’s School of Informatics and Computing will isolate patterns of interest and then insert those memes (ideas or patterns passed by imitation) into Twitter’s application programming interface (API) to obtain more information about the meme’s history.

“When we identify a trend we go back and examine how it was started, where the main injection points were, and any associated memes,” said Filippo Menczer, an associate professor of computer science and informatics. “When we drill down we’ll be able to see statistics and visualizations relating to tweets that mention the meme and basically reconstruct its history.”

The team will then generate diffusion network images that visitors to can view as groups of nodes and edges that identify retweets, mentions, and the extent of the epidemic. Visitors to the site will also see the output of a sentiment analysis algorithm that examines and extracts mood-identifying words and then assesses them on a known psychometric scale. That algorithm identifies the meme on scales ranging from anxious to calm, hostile to kind, unsure to sure, and confused to aware.

Menczer got the idea for the Truthy website after hearing researchers from Wellesley College speak earlier this year on their research analyzing a well-known Twitter bomb campaign conducted by the conservative group American Future Fund (AFF) against Martha Coakley, a democrat who lost the Massachusetts senatorial seat formerly held by the late Edward Kennedy. Republican challenger Scott Brown won the seat after AFF set up nine Twitter accounts in early morning hours prior to the election and then sent out 929 tweets in two hours before Twitter realized the information was spam. By then the messages had reached 60,000 people.

Truthy ArchitectureStreaming Twitter data acquired in real-time is matched against keywords to exclude tweets unlikely to contain political discussion and extract memes (mentions, hash tags, and urls). Memes of interest are isolated by considering only those that have just undergone significant changes in volume, or those that account for a significant portion of the total volume. Memes are then inserted in a database and Twitter API is used to obtain more information on each. 

Menczer explained that because search engines now include Twitter trends in search results, an astroturfing campaign — where the concerted efforts of special interests are disguised as a spontaneous grassroots movement — that includes Twitter bombs can jack up how high a result shows up on Google even if the information is false.

This is one reason also relies on input from users to denote a meme as “truthy,” or misinformation represented as fact. Having a crowdsourcing component will help the data mining effort and hopefully keep the loop between social media and search engines honest, researchers said.

“One of the concerns about social media is that people are being manipulated without realizing it because a meme can be given instant global popularity by a high search engine ranking, in turn perpetuating the falsehood,” Menczer said.

As information scientists, the group is interested in understanding meme diffusion from various perspectives: Menczer, associate director of IU’s Center for Complex Networks and Systems Research, focuses on data mining and meme burst modeling; Rudy Professor of Informatics Alessandro Vespignani‘s work relates to epidemic and contagion modeling; Associate Professor of Informatics Alessandro Flammini, also director of IU’s Complex Systems Program, conducts complex network analysis, especially related to online text and social media; and Johan Bollen, associate professor of informatics and computing, has a background in cognitive science and specializes in sentiment and mood analysis from online text.

The website’s name, Truthy, references a “stunt word” first employed by television comedian and political pundit Stephen Colbert in 2005 to satirize the use of emotional appeal as fact.

To speak with Menczer or other members of the Truthy development team, please contact Steve Chaplin, University Communications, at 812-856-1896 or

%d bloggers like this: