Data Science Digest 9

Posted by Chisel Analytics on Feb 20, 2020 6:45:00 AM

Keeping up is hard for data scientists to do. Chisel Analytics is happy to help!

Title: Pandas Version 1.0 is Out! Top 4 Features Every Data Scientist Should Know

Source: https://www.analyticsvidhya.com/blog/2020/01/pandas-version-1-top-4-features/
How: Make sure you have the current version of Pandas. If yours is an older version (includes 2.x), please update with
$ pip install --upgrade pandas==1.0.0rc0
Also, "first upgrade to Pandas 0.25 and to ensure your code is working without warnings, before upgrading to pandas 1.0."
When to use this: When you want to: filter and "analyze categorical and text-based features;" do calculations with missing values to generate "null" versus false; present data about the info in your dataframe or markdown tables in a clear fashion; plus more enhancements.
Why it's helpful: Now this widely used library offers: Dedicated DataTypes for strings, New Scalar for Missing Values, Improved Data Information Table, Markdown format for Dataframes.
Suggested application: When sharing information with those not used to working in the datasets or keeping logs for future and quick reference, or running calculations that can incorporate more records by leveraging a "null" value versus "false".
Business impact or insights to be gained: as more real world challenges are faced by data professionals, this open source data analysis/ manipulation tool continues to evolve to provide fast, flexible and expressive data structures for working with relational or labeled data

Read More

Topics: Professional Development, Data Science Developments

Data Science Digest 8

Posted by Chisel Analytics on Feb 6, 2020 6:45:00 AM

Keeping up is hard for data scientists to do. Chisel Analytics is happy to help!

Title: Karate Club consists of state-of-the-art methods to do unsupervised learning on graph structured data

Source: https://github.com/benedekrozemberczki/karateclub and https://karateclub.readthedocs.io/en/latest/notes/introduction.html
How: GitHub installation and documentation for data handling, full list of implemented methods, and datasets.
When to use this: When you need to perform "small-scale graph mining research. First, it provides network embedding techniques at the node and graph level. Second, it includes a variety of overlapping and non-overlapping community detection methods."
Why it's helpful: Incorporates Overlapping Community Detection, Non-Overlapping Community Detection, Neighborhood-Based Node Level Embedding, Structural Node Level Embedding, Attributed Node Level Embedding, and Graph Level Embedding.
Suggested application: Use the clusterings and embeddings for downstream learning. Use case examples include: how well Facebook page clusters and group memberships are aligned, abuse of the platform Twitch, classification of threads on Reddit.
Business impact or insights to be gained: "Only quick and minimal changes to the code are needed when a model performs poorly."

Read More

Topics: Professional Development, Data Science Developments

Data Science Digest 7

Posted by Chisel Analytics on Jan 23, 2020 6:45:00 AM

A quick look at three of the major players and some of what they have to offer around machine learning.

Title: Amazon Forecast - Accurate time-series forecasting service, based on the same technology used at Amazon.com, no machine learning experience required

Source: https://aws.amazon.com/forecast/
How: Upload your historical and related data, Amazon machine learning and AI generates various forecasts.
When to use this: When you don't have the resources, tools or in-house talent to build out a forecasting model and system which can accommodate multiple data series which change over time.
Why it's helpful: Fully managed service "so no servers to provision or machine learning models to build, train or deploy." Pay as you go so workable for most budgets.
Suggested application: Product demand planning, financial planning, resource planning.
Business impact or insights to be gained: Leveraging machine learning developed by Amazon, forecasts are more accurate and prepared in much shorter time (e.g., from months to hours).

Read More

Topics: Professional Development, Data Science Developments

Data Science Digest 6

Posted by Chisel Analytics on Jan 9, 2020 6:45:00 AM

Title: The 5 Most Useful Techniques to Handle Imbalanced Datasets

Author: Rahul Agarwal, Senior Statistical Analyst at Walmart Labs
Source: https://www.kdnuggets.com/2020/01/5-most-useful-techniques-handle-imbalanced-datasets.html
How: resampling, imbalanced-learn(imblearn); Tomek Links, SMOTE (Synthetic Minority Oversampling Technique); sklearn,
When to use this: At the occurrence of imbalanced datasets, that is, when "you have such a small sample for the positive class in your dataset that the model is unable to learn".
Why it's helpful: Address the problem of an imbalanced dataset: Random undersampling and oversampling, Undersampling and Oversampling using imbalanced-learn, Class weights in the models, and Change your Evaluation Metric.
Suggested application: Finance, marketing/ ad serving, transportation/ airline, medical, content moderation, etc.
Business impact or insights to be gained: Imbalanced datasets "fail to capture the minority class, which is most often the point of creating the model in the first place." Thus, analysis might overlook fraudulent bank transactions, identifying whether a patient has a rare disease, the faulty structural integrity of aircraft, etc.

Read More

Topics: Professional Development, Data Science Developments

Data Science Digest 5

Posted by Chisel Analytics on Dec 26, 2019 6:45:00 AM

Title: The Easy Way to Do Advanced Data Visualisation for Data Scientists

Author: George Seif, AI/Machine Learning Engineer, Kdnuggets
Source: kdnuggets.com/2019/08/advanced-data-visualisation-data-scientists.html
How:  Python library Plotly, D3.js
When to use this
: If data visualization isn't your primary area...and yet you are tasked to provide data visualizations.
Why it's helpful: Plotly provides interactivity out of the box, versus Matplotlib.
Suggested application: Fancy plots, scatter plots, box plots, heat maps.
Business impact or insights to be gained: Simpler to build with than Matplotlib with interactivity which will be well received by non-data specialist stakeholders.

Read More

Topics: Professional Development, Data Science Developments

Data Scientist Digest 4

Posted by Chisel Analytics on Dec 12, 2019 6:45:00 AM

Title: Great R packages for data import, wrangling and visualization

Author: Sharon Machlis, Executive Editor, Data & Analytics, Computerworld
Source: https://www.computerworld.com/article/2921176/great-r-packages-for-data-import-wrangling-visualization.html
How: Install various packages to accomplish dedicated tasks
When to use this: When you need specialized tools for data import, wrangling, visualization and analysis when using R.
Why it's helpful: Links with explanations of package/ category/ description/ sample use/ author for a variety of tools.
Suggested application: Before doing your own search, check this cheat sheet first for a likely shortcut.
Business impact or insights to be gained: Likely there's a tool for what you need to do and it may be listed here, saving time and exposing you to additional resources.

Read More

Topics: Professional Development, Data Science Developments

Data Science Digest 3

Posted by Chisel Analytics on Nov 28, 2019 6:45:00 AM

Keeping up to date on developments in the data sciences is hard. Here are a few items you may have missed:

Read More

Topics: Professional Development, Data Science Developments

Data Science Digest 2

Posted by Chisel Analytics on Nov 14, 2019 6:45:00 AM

In case you missed them the first time, be sure to bookmark these articles!

Read More

Topics: Professional Development, Data Science Developments

Data Scientist Digest 1

Posted by Chisel Analytics on Oct 31, 2019 6:45:00 AM

7 Steps to Mastering Data Preparation for Machine Learning with Python

Author: Matthew Mayo, KDNuggets

Source: https://www.kdnuggets.com/2019/06/7-steps-mastering-data-preparation-python.html

How: Pandas library, Python, EDA (Exploratory Data Analysis)

When to use this: when preparing data for machine learning

Why it's helpful: Step-by-step reference with supporting links, as well as an introduction for those in IT or data sciences but not as involved in the data preparation process

Suggested application: Refresher for those involved in data wrangling, this article was updated from the 2017 version to incorporate updated library references, related articles and insights from real world practice

Business impact or insights to be gained: Developments in Machine Learning and resources available to support data wrangling work hand in hand to improve the outcomes by better preparing the inputs


Read More

Topics: Professional Development, Data Science Developments

Analyzing What's Out There. Predicting What You Need to Know.

Posted by Chisel Analytics on Oct 17, 2019 6:45:00 AM

As a data specialist, expectations are high. Data scientists are perceived as essentially magicians, who can wrangle data, whip up an algorithm and pull a result out of their hat. On demand.

Read More

Topics: Professional Development, Data Science Developments

Chisel Analytics

The Benefits of Analytics

Grow in your understanding of the opportunities that Data Analytics offers to improve business and solve problems. We've created a platform to break down the barriers to data analytics and data science. Our blog, tools and resources help companies, recruiters and data specialists stay informed, stay organized and stay engaged.

Sign up to get content relevant to you:

About Data Science (ideal for CEOs and those new to the intricacies of Data Analytics)
What IT Managers Need to Know about Data Science
Recruiting for the Data Sciences
Data Science Digest (highlights of tips from popular data science journals around the web)

Subscribe Here!

Recent Posts