Friday, March 07, 2014

Smart Data


Everyone wants a piece of ‘big data’ but most people are not really sure how to go about it. It is generally thought of as a large investment involving big companies who can help set up the infrastructure to collect all the data they will ever touch, generate or consume. The thought is that this investment will enable them to visualize the data in various fancy ways or give them ability to slice and dice the data as they could imagine. These are all great capabilities and will be useful for an organization who is mature in such data based decision system, but most companies are not even looking at the data they already have which could be very useful in providing them with insights before they start to venture into the World Wild Web of outside data.

For SMBs and even for some of the large companies, just collecting and using the in-house data itself can help them with better understanding and taking various decisions. For example, a retailer can make actionable decision in various parts of a business lifecycle; buyers making the right buying choices based on the data on pricing, demographics, seasonality and competition; merchandiser can reach the customer at the right price point with the right promotions  as well as bring in new customers and provide up-sell and cross-sell; fulfillment can be an exact science as supposed failed promises and disappointed customers; and technology infrastructure can plan accordingly for promotions and seasons.
One of the challenges for the SMBs is that they generally do not operate their own technology which is maintained and hosted by other companies. And in many cases, they do not even have access to their own data. For them, the first step would be to bring their own data under their control so they can start to assess it and drive value from it.

Some of the forward-thinking companies are changing this culture and moving from gut-feel to fact-based when it comes to decision making. A clear example is the product pricing on Amazon.com where the prices can change within minutes and there is no manual person involved. It probably took many years and few mistakes before the merchandisers were comfortable with a system making such decision, but nonetheless this is the only model which would have scaled for Amazon without leaving any penny on the table. So the companies which collect this data and apply various predictive and optimization modeling algorithms to it are going to get a lot better and efficient than the ones who don’t.


The term ‘Big Data’ could be intimidating as it seems to suggest that everything about it is big; commitment, investment, and so is the value. That’s why I prefer to think of it as ‘Smart Data’. In the end, it’s just data and all of a sudden it’s sexy again.

Sunday, September 16, 2012

Mupd8 – The @WalmartLabs Real-time Platform


The @WalmartLabs Mupd8 platform has already supported more than a dozen sophisticated stream applications processing over 300 million status updates per day, gathering real-time information on our product taxonomy. Learn more about our experience (see our VLDB 2012 paper) and check out the newest version, available under the Apache License 2.0, for yourself at http://github.com/walmartlabs/mupd8. Get started on your application with "Starting a New Application in MUPD8," available in the Mupd8 source tree and http://walmartlabs.github.com/mupd8/quickstart.html.

http://walmartlabs.blogspot.com/2012/09/mupd8-walmartlabs-real-time-platform.html

How Disney built a big data platform on a startup budget

In order to “remove the excuses” for business users not loading their data into the system, they just need to point the custom-built user interface at their files. (Disney’s platform is growing at 5TB a day, and there are still many other types of data it needs to house, Jacob said.) Because they’ve built wrappers around the technology, Jacob’s team doesn’t talk about Hadoop and MongoDB to internal users, only about analytics and queries. 

http://gigaom.com/data/how-disney-built-a-big-data-platform-on-a-startup-budget/

Monday, June 25, 2012

How Equifax is using Big Data





It maintains information about people who share the same phone number or address, "non-obvious" relationships between individuals, loans for dental work, magazine subscriptions, rental history, real estate assets, investment wealth, retail purchasing, the type of federal tax return someone files, marital status, employment, utility payments, cable TV accounts, criminal records, debt-to-income ratios, changes of address, motor vehicle files, post office boxes, inferences about someone's capacity to pay bills, predictions about someone's propensity to pay, links to past and potential fraud crimes--and more.

 [Source: Equifax Eyes Are Watching You--Big Data Means Big Brother

Sunday, August 22, 2010

Samsung Galaxy S (US TMobile version as Vibrant)

I have been using Smartphones since a long time and my last device for HTC HD2 was a great `first impression' phone but the fascination died in couple of months. More on that in my HD2 review. Since that experience I was hesitant to buy a similar device with Android OS but so far this device has everything I want and hasn't disappointed me once.

Display: The screen is amazingly clear and crisp. The included movie Avatar does a good job showing off the phone and the screen. No hesitation or flickering or pixilation whatsoever.

Quick and very responsive interface. The device boots up in 30+ seconds but once it's completely up and running (in 60+ seconds) after starting services like media scanner, it is quick in everything you do. Never experienced any lag.

The default buttons on the device bottom (Menu, Home, Back/Return, and Search) are very helpful. Specially in the case where you click on a link from and email and it opens in the browser, you can always click on the back button to go to the email. This is very useful for someone who is used to Windows devices where its difficult to go back to Email app once you open another app. The Back button works on the device across apps. Very cool.

When you are using the phone at night it might be difficult to find those four buttons at the bottom when the back light turns off, but you get used to it and slowly figure out where the buttons will be. Nothing major.

Pictures are very clear and crisp. Even when compared to HD2 in low light, these come out so much better.

Battery is slightly better than HD2 but pretty much at par when using 3G data or videos. It would need daily charging is you use good amount of 3G.

Android marketplace is a plus when compared with Windows Mobile 6.5 devices. But Windows Phone 7 is supposed to change that.

It comes with two back cases, not sure why. But would have preferred a leather or rubber case like HD2.

The T-Mobile version also comes with 16GB installed and 2GB card. The MicroUSB <-> USB cable comes with USB adapter charger so you carry only one cable.

Some issues or nice to have features:

Front camera for video calling would be nice.

No Flash light for the camera. So no flash light app as in HD2.

It comes with many apps which are either just a link or require you to register or will get you to sign-up for a subscription. So be careful about what you sign up for since some of these will give you a month free and then unless you call them, they'll keep charging.

-- After couple of weeks of usage --

Its still a great device so far. Few things worth mentioning:

* Really could use a front facing camera for video calling. The non-US version has one but it was removed for ATT and TMobile in US.

* The camera quality is amazing. Even the various built-in affects like action-shoot, panoramic shots are a great addition where you can shoot a 270 degree pictures and it stitches them all together. I lets you take 8 pictures while guiding you how to move and then creates a wonderful panoramic picture. Amazing.

* Battery life seems much better than that of HTC HD2

* The GPS is flaky. The applications like Four Square don't connect to it well unless you open the Google Navigation first to connect with GPS and then open other apps that use it. Samsung is supposed to come out with an updated software in Sept to fix it. But not a deal breaker.

* Still a great snappy interface.

* Android marketplace has some great free apps, but still not as good as Apple. I'm sure that will change overtime as android marketplace is 'free' as compared to Apple where they take a cut.

Friday, August 06, 2010

Social Media, Optimization (SMO), and Marketing (SMM)

What is social media? What is SMO?

Social Media refers to media and content generated for/from social interactions. It is generally used in the context of web based technologies in the creation and generation of user content. Think of it as the dialog or the text of communication between two people, on the internet. This also extends to a group of people and exchanging communication between them. Social Media is considered a foundation of Web2.0 when the technologies started leveraging this UGC (User Generated Content). The simplest form of social media is a Blog while the other forms can include social networking sites like Facebook and MySpace, collaboration and knowledge sharing aspects of Wiki, Social Bookmarking (Delicious), user reviews like epinion/yelp, and discussion groups.
In shorts, Social Media is any form of UGC available on the net which is used for social interaction and sharing.

SMO (Social Media Optimization), on the other hand, is a process of using Social Media to attract attention to products and services on the Web. It is a collection of techniques to use social media to bring more customers to a website or to increase brand aware ness or to manage brand/product reputation on the internet.

Fewaspects  SMO are viral marketing, online reputation management, brand building, customer satisfaction, knowledge management, business development, attracting visitors, and product development.
Some of the ways to perform SMO (or attract eyes to your website content) is by RSS Feeds, ‘Sharing buttons’, blogs, diverting people from other blogs/discussion groups to your official blog, and posting updates etc.


How does it differ from SEO?

Search Engine Optimization (SEO) is an art solely dedicated to improving the website ranking (and visibility) for a search engine. The whole purpose of SEO is to get a website or a web page ranked higher in the search results so they show up ‘above the fold’ and hence potentially getting more traffic.

SEO is very specific and there are a laundry list of tasks and strategies which you do to achieve that. You can use the same strategy for different products/sites/pages to improve their visibility. While on the other hand, SMO may require a lot more knowledge and understanding about the client/product and at the same time would need active involvement and maintaining reputation. SMO is more closely tied to the brand reputation when compared to SEO.

Suppose I have a brand ‘Buzzolium’ which is a handbag line incorporating the use of hand-woven silk. The SEO task to promote the brand and the products will be to get my website to rank higher when someone searches on ‘silk handbags’, ‘hand woven silk bags’, ‘silk bags’ etc.

While the typical SMO activities will be to expose my brand in all the forums, blogs, facebook fan pages, discussions groups of ‘silk bags’. The goal being that all the places on the internet which talk about silk-bags, would get a  reference to Buzzolium silk bags so I get the exposure in the right  circle and then bring the users to my own website.

So you can see that they are very different activities and require different expertise and skill levels.


Do I need to use SMO?

Everyone or anyone looking to use internet to gain popularity and raise aware about themselves or their products. If you think you can benefit from more people knowing about your brand then you need it.


What do I need to do if I decide to engage in SMO?

Once you know that you want to extend your reach thing SMO, it’s a simple three step process to get SMO working for you: a) Discover, b) Strategize, and c) Optimize.

The Discover phase refers to understanding your environment, your assets, opportunities, market, potential clients/users, niche areas etc.

The Strategize phase defines the communication plan, defining KPIs (key performance indicators) and how the actions will generate results.

The last phase of Optimize puts the plan in action. It creates the engagements, deploys the plan, measures the actions against the goals, and finds new opportunities.

Typically an organization will put their Marketing Director or Manager in a similar role while many of the tasks can be done by a simpler role.

You can perform these activities yourself or completely outsource but making sure that the outside team understands your brand positioning and feels passionate to promote it. You can also have a mixed model where you use external tools to monitor your brand, competition, events and then take decisions internally to bring in the traffic to your official content (blog, facebook fan page, twitter stream, etc)

Can an SEO expert do SMO also?

SEO and SMO are different ways of achieving the same result and they both are important. SEO has been there for few years now and SMO is just getting started. We strongly believe that SMO will be huge in coming months/years and will need a platform for manage it for any brand. They are often mentioned together and everyone offering a SEO service has also started offering SMO. SMO requires different skills and more engagement for every brand. It needs a custom strategy for reaching out to the right demographics and customer base so its more involved as compared to SEO.

You also need a lot more tools to support SMO.


What differentiates a good SMO from a bad one?

Making social media work for you needs a custom strategy which suits your specific business. Although the channels might be same, they need custom editorial and strategy to make it successful. Some of the channels are:

Facebook, Twitter, LinkedIn, blogs, discussion forums, social bookmarking, Wiki pages, photo/video sharing, classifieds, directory submission, article writing, press releases, activities like Fourquare etc.

A good SMO strategy will not only improve your own content in the above networks, it’ll bring in more traffic and people to your content. While a not-so-good SMO might just charge you for writing X number of Facebook status updates or Y tweets per week. This will just put more content on the pages and will not proactively bring traffic from other sites and forums.

What should I not do with SMO?

Don’t engage in SMO without understand what you want from it. Don’t do it for the sake of doing it and then just pay someone hundreds of dollars a month to create a Facebook account, a fan page, 8 posts, 1 blog and 4 posts, 4 articles, 10 images and 4 video uploads.


What SMO offerings are out there?

Just search for social media optimization/marketing and there are many companies doing it. Pretty much every SEO is also doing SMO these days. Some of them are Radian6, Trackur, and Scoutlabs

How to evaluate if an SMO offering is good for me?

This is where it gets a little tricky. There’s no magic formula to find a good SMO product as it depends on your requirements and how much you want to be involved. I’ll list a set questions which, when answered, would tell you how to evaluate such a product.

a) How much do you already spend on SMO activities? Many companies have Marketing directors and manages spending an average of 6-8 hours per week which could get pricy.

b) Do you know what you want from SMO? Generate exposure, increase traffic, lead generation, increase search ranking, new business partnerships, sell products and services, reduce marketing expense or something else?

c) Where’s your market and where do they hangout on the internet?

d) What part of SMO you want to do in-house? Building the tools, configuration of tools to monitor relevant information, figuring out trends, generating reports, responding to feedback, editorial content, driving traffic from other websites and destinations, promotional events, identify influential consumers, groups and networks, understanding consumer behavior and motivations, etc.

e) How do I measure the effectiveness of a campaign or ROI ?

Monday, June 21, 2010

Text mining resources.

Resources for Semantic searches, entity extraction, classification, and other NLP oriented approaches.
This is just a reference for various approaches I came across while working on my semantic mining projects. Some of these are full fledged software platforms while others are APIs or small specific algorithms.
1. Yahoo Term Extractor APIs
Yahoo has an API which you can use for term extractions. It does an ok job but seems like there’s no new development on it. http://developer.yahoo.com/search/content/V1/termExtraction.html
There are projects on github which wrap this api and can be used from within Rails or other languages, although the use of the API is very straight forward
2. Git hub projects
Github has many projects for term extraction, classification etc

a. Term-Extractor http://github.com/DRMacIver/term-extractor

b. Bayes_motel for multi-variate classification http://github.com/mperham/bayes_motel



3. Rubyforge projects
These projects do classification, stemming etc.

a. Classifier
b. Stemmer
http://rubyforge.org/projects/classifier/

4. WEKA (collection of machine learning algos) http://www.cs.waikato.ac.nz/ml/weka/
There’s also a JRuby wrapper in github http://github.com/bmaland/Eureka

5. WordNet (http://wordnet.princeton.edu/ )

6. GATE (NLP tools) http://gate.ac.uk/

7. LingPipe A not so open source version of linguistic analysis libraries

8. Topia_termextractor http://pypi.python.org/pypi/topia.termextract/

9. Open Calais. This does a lot more than entity extraction. It also does classification.

10. KEA (Keyphrase Extraction Algorithm)

11. Maui Indexer (Google code project)
12. Other references
a. http://alias-i.com/lingpipe/web/competition.html
b. http://www.searchenginecaffe.com/2007/03/java-open-source-text-mining-and.html
c. Ruby related NLP: http://web.media.mit.edu/~dustin/rubyai.html