Embrace The Noise: A Case Study Of Text Annotation For Medical Imaging
Embrace The Noise: A Case Study Of Text Annotation For Medical Imaging In this post we'll discuss the recent paper TextRay: Mining Clinical Reports to Gain a Broad Understanding of Chest X-rays focusing on the best [...]
Deep Learning: Which Loss and Activation Functions should I use?
Deep Learning: Which Loss and Activation Functions should I use? The purpose of this post is to provide guidance on which combination of final-layer activation function and loss function should be used in a [...]
Bulk Mutation in an Integration Data Lake with Spark
Bulk Mutation in an Integration Data Lake with Spark Data lakes act as repository of data from various sources, possibly of different formats. It can be used to build data warehouse or to perform [...]
Learning Alarm Threshold from User Feedback using Decision Tree on Spark
Learning Alarm Threshold from User Feedback using Decision Tree on Spark Alarm fatigue is a phenomena where some one is exposed to large number of alarms, become desensitized to them and start ignoring them. It’s been [...]
Pluggable Rule Driven Data Validation with Spark
Pluggable Rule Driven Data Validation with Spark Data validation is an essential component in any ETL data pipeline. As we all know most Data Engineers and Scientist spend most of their time cleaning and preparing [...]
Improving Elastic Search Query Result with Query Expansion using Topic Modeling
Improving Elastic Search Query Result with Query Expansion using Topic Modeling Query expansion is a process of reformulating a query to improve query results and to be more specific to improve the recall for a [...]
Cassandra Range Query Made Simple
In Cassandra, rows are hash partitioned by default. If you want to data sorted by some attribute, column name sorting feature of Cassandra is usually exploited. If you look at the Cassandra slice range [...]
Hive Plays Well with JSON
Hive Plays Well with JSON Hive is an abstraction on Hadoop Map Reduce. It provides a SQL like interface for querying HDFS data, whch accounts for most of it’s popularity. In Hive, table structured data [...]
Removing Duplicates from Order Data Using Spark
Removing Duplicates from Order Data Using Spark If you work with data, there is a high probability that you have run into duplicate data in your data set. Removing duplicates in Big Data is [...]