Well, the title is quite evident what would be there in the post! Let me clear the air a bit, as time flies everything in Tech industry is becoming more dependent on data. And as the data grows, there is need to manage it. Well, we have Oracle, SQL Server then why the transition to Big Data/Data mining? Well, only managing data is not enough, we should be able to get meaningful information and that’s where this starts.
I am writing this post after a long time as I am doing my Master’s in Computer Science and hardly get time. Well, that’s life!! 😛
Now, during my Master’s I have faced this question a lot while doing Data Mining. Which tools are good to use and which are not? There is no correct answer to that but I will try to make it simple.
The steps of Knowledge Discovery involves Data Preprocessing (basically Data Cleaning and stuff), Data Transformation, Data Mining, and Interpretation. Well, that’s a large topic to discuss and I will cover in another post. Getting back to the point,
TOOLS for Data Cleaning:
While there are a lot of tools available paid/free out there but I am focusing on tools which can be useful from an academic perspective:
- OpenRefine: This is one of the lightest and best tools out there. Once started as Google Refine project, it is quite easy to operate and understand for newbies.
- R/Python: Well, duh! R is a language for Data Science people! Love it! Hate it! But it’s gonna stay in your life. While R having powerful libraries and functions for Data Mining, Python gives you the customizability you want with the data.
- Excel: The Granny is still here, Microsoft Excel is the simplest and effective tool for data processing but have its own limitations on types of files it can accept to the size of data it can handle.
- Data Wrangler: I have not used it much, but by the experience of it I can say it is one heck of a tool for efficient cleaning and transformation of data. With its ability to export data in multiple formats, it sure is one to try out.
TOOLS/Languages for Data Mining:
With a lot of data, you cannot expect C/C++ to do the job. So, here are some languages and tools which as a learner and practitioner you can use:
- Python: Well, Python is my favorite. Why? It can do pretty much anything, why not Data Mining? Hell Yeah! With Python and its libraries such as MatPlotlib, SciPy, Skit-learn, you can implement Data science algorithm. Moreover, it’s Python, so you can build a product on top of the code. Cool! Isn’t it?
- R: R is love! R is life! If you are starting in Data Science, you should get used to R. It has a set of powerful libraries which will make your life easier.
- Matlab: You want simple? Well, Matlab is there for you! It offers quite a good service with Data, and it is more interactive than R and Python. At least you don’t have to write a lot of code, even so, it is easier in Matlab. Problems? It is paid but not academic version. Students it is a GO GO!
- Q: Learned about it during a Consortium of Data Scientist from CTO of Rx DataScience. It seems to be a better choice over R as far I got to know. Didn’t try it, but an industry expert touting speaks more than anything. I would love to try it and give more insights.
You are Data Scientist, you understand the data but normal layman doesn’t! For those Muggles, you need to hide the magic behind and show the simplest form! These are some of the tools for you:
- Tableau: Well, if you into Data Science and don’t know about Tableau go start learning! It’s actually better than any tool out there. Cons: It is paid. But as a student, you can use free academic version.
- D3.js: It is free! It is The Web! It is an amazing experience! Try it and see if it fits your work style. You can even make your contributions to this open source project.
- Google Charts: It’s for a number of basic stuff, but a cool tool. Well, you can expect Google to add more features down the line.
My take on the topic: Don’t confine yourself to a particular language but give a try to explore different tools and languages. At the end, it’s more about knowledge of how algorithms and concepts of Data Mining works.
Ape Out! Enjoyed the post? Comment below with feedbacks!