Pranab Ghosh Archives

Hive Plays Well with JSON

By Dj Das|2018-11-28T11:08:04+00:00November 28th, 2018|Pranab Ghosh|

Hive Plays Well with JSON Hive is an abstraction on Hadoop Map Reduce. It provides a SQL like interface for querying HDFS data, whch accounts for most of it’s popularity. In Hive, table structured data [...]

Removing Duplicates from Order Data Using Spark

By Dj Das|2019-01-02T07:41:28+00:00November 28th, 2018|Pranab Ghosh|

Removing Duplicates from Order Data Using Spark If you work with data, there is a high probability that you have run into duplicate data in your data set. Removing duplicates in Big Data is [...]

Storing Nested Objects in Cassandra with Composite Columns

By Dj Das|2019-01-02T07:38:16+00:00November 27th, 2018|Pranab Ghosh|

Storing Nested Objects in Cassandra with Composite Columns One of the popular features of MongoDB is the ability to store arbitrarily nested objects and be able to index on any nested field. In this post I will [...]

Data Normalization with Spark

By Dj Das|2019-01-02T07:34:58+00:00November 27th, 2018|Pranab Ghosh|

Data Normalization with Spark Data normalization is a required data preparation step for many Machine Learning algorithms. These algorithms are sensitive to the relative values of the feature attributes. Data normalization is the process of bringing all the [...]

Anomaly Detection with Robust Zscore

By Dj Das|2019-07-17T13:47:16+00:00November 27th, 2018|Pranab Ghosh|

Anomaly Detection with Robust Zscore Anomaly detection with with various statistical modeling based techniques are simple and effective. The Zscore based technique is one among them. Zscore is defined as the absolute difference between [...]

Bulk Insert, Update and Delete in Hadoop Data Lake

By Dj Das|2019-01-02T07:27:09+00:00November 27th, 2018|Pranab Ghosh|

Bulk Insert, Update and Delete in Hadoop Data Lake Hadoop Data Lake, unlike traditional data warehouse, does not enforce schema on write and serves as a repository of data with different formats from various sources. If [...]

Handling Categorical Feature Variables in Machine Learning using Spark

By Dj Das|2019-08-29T12:28:14+00:00November 27th, 2018|Pranab Ghosh|

Handling Categorical Feature Variables in Machine Learning using Spark Categorical features variables i.e. features variables with fixed set of unique values appear in the training data set for many real world problems. However, categorical variables [...]

Combating High Cardinality Features in Supervised Machine Learning

By Dj Das|2019-08-29T12:31:49+00:00November 27th, 2018|Pranab Ghosh|

Combating High Cardinality Features in Supervised Machine Learning Typical training data set for real world machine learning problems has mixture of different types of data including numerical and categorical. Many machine learning algorithms can not [...]

Ruling with Drools Rule Engine

By Dj Das|2018-11-27T09:45:21+00:00November 22nd, 2018|Pranab Ghosh|

In a project several years ago I built a rule engine from scratch. In a recent project, which needed a rule engine, I decided to take different route. I decided to give Drools rule engine [...]

Auto Training and Parameter Tuning for a ScikitLearn based Model for Leads Conversion Prediction

By Dj Das|2019-08-22T07:32:52+00:00May 29th, 2018|Analytics, Blogs, Data Sciences, Pranab Ghosh, Predictive Analytics, Predictive Modeling, Python, ScikitLearn|

Auto Training and Parameter Tuning for a ScikitLearn based Model for Leads Conversion Prediction This is a sequel to my last blog on CRM leads conversion prediction using Gradient Boosted Trees as implemented in ScikitLearn. The focus of [...]

Hive Plays Well with JSON

Removing Duplicates from Order Data Using Spark

Storing Nested Objects in Cassandra with Composite Columns

Data Normalization with Spark

Anomaly Detection with Robust Zscore

Bulk Insert, Update and Delete in Hadoop Data Lake

Handling Categorical Feature Variables in Machine Learning using Spark

Combating High Cardinality Features in Supervised Machine Learning

Ruling with Drools Rule Engine

Auto Training and Parameter Tuning for a ScikitLearn based Model for Leads Conversion Prediction

AI-Driven Quality Inspection System for Manufacturing Industry

ThirdEye Data at B. P. Poddar Institute – Empowering Future AI & Data Professionals

AI-Based Automated Resume Screening for HR & Recruitment

AI-Powered Demand Forecasting for Retail & E-Commerce

AI-Powered Legal Contract Analyzer for Law Firms

Intelligent Patient Diagnosis Assistant for Healthcare

AI-Powered Fraud Detection for Banking & Finance

Computer Vision-based Object Counting

Multilingual Querying System

Spam Prediction for Telco Dataset

Primary Services

Pre-Built Applications

Data & AI Solutions

ThirdEye Exclusive

Insights

Talk To Us