John Romant's Technology Blog

If it's technology, I want to know about it.

Category Archives: Data Mining

Duqu Virus attacks Iran. All facilities and equipment are said to be “cleaned”.

On Sunday Iran has indicated that the Duqu Virus (click here to learn about the Duqu super virus) has been detected, but the depth of the contamination is currently unknown. The director of Iran’s Passive Defense Organization, Gholam Reza Jalali, says that the Islamic Republic has produced an antivirus software protecting software and hardware systems of governmental centers against the  Duqu super virus.  All facilities and equipment, which were affected with this virus, have been cleaned, and the virus is under control, Gholam Reza Jalali told IRNA on Sunday.

Side note,  I wander how much the Iranian Duqu anti-virus will go for on the open market ?  I also wander if Iran is still using Siemens control systems.  Sounds like a film plot in the making.

follow me on twitter: @johnromant


Infographic: How often do criminals use social media technology?

Are criminals using technology like Facebook, Twitter, Google Street View and/or Foursquare to help them commit their crimes?   Click here to visit original post.

Lady Gaga Breaks the 1 Billion YouTube Video View Barrier.

John Romant

I cannot believe I am even typing the name Lady Gaga, but this is Social Media History.

Lady Gaga just broke 1 billion YouTube video views.  I wish I could see the raw data, like unique views, returning visitors, etc.  Either way, 1 billion YouTube video views in incredible.  Maybe the time has come where alternative media is officially becoming mainstream and mainstream media is officially being shoved aside.

I wouldn’t be surprised if the news of Lady Gaga breaking the 1 Billion YouTube Video views mark will create a spike in her views.  I know I added a few extra views by just writing this article.  Crazy!  Give it another year and we will see more people like Justin Bieber and a few others break the 1 billion Youtube Video views mark.

By the way, I wonder about the total data on ALL video hosting views is.  How many video hosting sites are there besides YouTube?  I wish I had time to compile the data myself.  If anyone has an idea, let me know.

What is the Average StumbleUpon User Like based on Analytics?

So far this site has received 89,309 visitors from Stumbleupon. This is a pretty big sample that enables me to find some commonalities among users and broadly profile the average StumbleUpon user. I’ve used Google Analytics as my tool of choice for getting the data.

Where are StumbleUpon users from?

StumbleUpon has more international audience compared to sites like Digg (where 90%+ of people are from Northern America.) It’s different for Stumbleupon:

what region are stumbleupon users from

For Americas, most of them are from US. But what about Europe?


I didn’t expect Northern Europe will be that high on the list.

What browser do Stumbleupon users use?

The data gives a clear picture:


Internet Explorer = a miserable failure here. There’s a Stumbleupon toolbar for IE, yet almost nobody uses it. Chrome is gaining in popularity. Firefox is the king on the throne. 93%!!!

Do Stumblers use the latest Flash version?


Yes. What about Java, do they have Java support?

java support stumbleupon

Oh well, not really. Another explanation is that many of them might have NoScript installed which is blocking JavaScript.

What operating system do Stumbleupon users have?


The surprising part to me is about mobile. There was 1 visits each from iPhone and iPad. 2 visitors in total from a sample of 89.000 users! Seems StumbleUpon needs to make it more convenient for its users to use the toolbar on their mobile phones.

Stumbleupon + Safari on the Mac?


Uh, Safari = miserable failure (with IE.) There are some tricks to get StumbleUpon running on Safari, though. Still, not many people are willing to use Safari for stumbling on Mac.

Are most Stumblers using widescreen monitors?

To answer this question, we need to take a look at the screen resolution (certain screen resolutions like 1280×800 are not used on full screen monitors):


From this data, we can estimate that around 72% of Stumblers use widescreen monitor compared to 18% who use full screen monitors. I didn’t count the last one (1024×600) which is mostly used on laptops and netbooks.

How old are Stumbleupon users?

I’ve used Quantcast (which is pretty accurate) to get this data:


Be aware this data is only for United States. It shouldn’t be very different for other countries, though, considering the fact most sites attract similar audiences (especially for sex and the age group.)

So, based on all of the data above, we can now make a profile of the average Stumbleupon user:


Have you checked your Facebook PhoneBook yet?

Image representing Facebook as depicted in Cru...

Image via CrunchBase

Facebook has shamefully added all of our friends phone numbers for everyone to see.  Was this a mistake or the evolution of facebook?  See it for yourself, go to the top right of your screen, click “Account” the “Edit Friends”.  On the upper left side of your screen is the “Phone Book”. Everyone’s phone numbers are now being published.  You need to manually change your privacy settings to fix this problem.
Unless Facebook changes their ways, you better get very familiar with the user interface on the facebook privacy page. to search, identify smear tactics, Twitter-bombs through November election runup

Sept. 28, 2010

BLOOMINGTON, Ind. — Astroturfers, Twitter-bombers and smear campaigners need beware this election season as a group of leading Indiana University information and computer scientists today unleashed, a sophisticated new Twitter-based research tool that combines data mining, social network analysis and crowdsourcing to uncover deceptive tactics and misinformation leading up to the Nov. 2 elections.

Lady Gaga tweetPictured here is a diffusion network created by for the Twitter burst generated by Lady Gaga supporters toward John McCain following Gaga’s comments about McCain’s opposition to repealing Don’t Ask, Don’t Tell. The meme was tweeted 1,276 times by 1,100 users, with 168 users retweeting 696 times and another 59 users mentioning the meme 325 times. Links in orange are mentions; blue links show retweets. 

Combing through thousands of tweets per hour in search of political keywords, the team based out of IU’s School of Informatics and Computing will isolate patterns of interest and then insert those memes (ideas or patterns passed by imitation) into Twitter’s application programming interface (API) to obtain more information about the meme’s history.

“When we identify a trend we go back and examine how it was started, where the main injection points were, and any associated memes,” said Filippo Menczer, an associate professor of computer science and informatics. “When we drill down we’ll be able to see statistics and visualizations relating to tweets that mention the meme and basically reconstruct its history.”

The team will then generate diffusion network images that visitors to can view as groups of nodes and edges that identify retweets, mentions, and the extent of the epidemic. Visitors to the site will also see the output of a sentiment analysis algorithm that examines and extracts mood-identifying words and then assesses them on a known psychometric scale. That algorithm identifies the meme on scales ranging from anxious to calm, hostile to kind, unsure to sure, and confused to aware.

Menczer got the idea for the Truthy website after hearing researchers from Wellesley College speak earlier this year on their research analyzing a well-known Twitter bomb campaign conducted by the conservative group American Future Fund (AFF) against Martha Coakley, a democrat who lost the Massachusetts senatorial seat formerly held by the late Edward Kennedy. Republican challenger Scott Brown won the seat after AFF set up nine Twitter accounts in early morning hours prior to the election and then sent out 929 tweets in two hours before Twitter realized the information was spam. By then the messages had reached 60,000 people.

Truthy ArchitectureStreaming Twitter data acquired in real-time is matched against keywords to exclude tweets unlikely to contain political discussion and extract memes (mentions, hash tags, and urls). Memes of interest are isolated by considering only those that have just undergone significant changes in volume, or those that account for a significant portion of the total volume. Memes are then inserted in a database and Twitter API is used to obtain more information on each. 

Menczer explained that because search engines now include Twitter trends in search results, an astroturfing campaign — where the concerted efforts of special interests are disguised as a spontaneous grassroots movement — that includes Twitter bombs can jack up how high a result shows up on Google even if the information is false.

This is one reason also relies on input from users to denote a meme as “truthy,” or misinformation represented as fact. Having a crowdsourcing component will help the data mining effort and hopefully keep the loop between social media and search engines honest, researchers said.

“One of the concerns about social media is that people are being manipulated without realizing it because a meme can be given instant global popularity by a high search engine ranking, in turn perpetuating the falsehood,” Menczer said.

As information scientists, the group is interested in understanding meme diffusion from various perspectives: Menczer, associate director of IU’s Center for Complex Networks and Systems Research, focuses on data mining and meme burst modeling; Rudy Professor of Informatics Alessandro Vespignani‘s work relates to epidemic and contagion modeling; Associate Professor of Informatics Alessandro Flammini, also director of IU’s Complex Systems Program, conducts complex network analysis, especially related to online text and social media; and Johan Bollen, associate professor of informatics and computing, has a background in cognitive science and specializes in sentiment and mood analysis from online text.

The website’s name, Truthy, references a “stunt word” first employed by television comedian and political pundit Stephen Colbert in 2005 to satirize the use of emotional appeal as fact.

To speak with Menczer or other members of the Truthy development team, please contact Steve Chaplin, University Communications, at 812-856-1896 or

%d bloggers like this: