AI – Past, Present and Future
AI - Past, Present, and Future AI has gone through many cycles of ups and downs. There has been stunning progress as well as a disappointing slump. Fueled by Machine Learning and specifically, [...]
AI - Past, Present, and Future AI has gone through many cycles of ups and downs. There has been stunning progress as well as a disappointing slump. Fueled by Machine Learning and specifically, [...]
Tabular Data Column Semantic Type Identification with Contrastive Deep Learning When data is aggregated from various sources in a dynamic environment where the data format might change without any notice, identifying the semantic [...]
Semantic Search with Pre Trained Neural Transformer Model using Document, Sentence and Token Level Embedding Some time ago I worked on an enterprise search project, where we were tasked to improve the performance [...]
With Covid-19 ravaging the world, a lot of people are exploring ways AI and ML can help in combating the virus spread and infection. Viruses like Covid-19 is a complex socio-economic and public health [...]
Six Unsupervised Extractive Text Summarization Techniques Side by Side In text summarization, we create a summary of the original content that is coherent and captures the salient points in the original content. There [...]
Bulk Mutation in an Integration Data Lake with Spark Data lakes act as repository of data from various sources, possibly of different formats. It can be used to build data warehouse or to perform [...]
Learning Alarm Threshold from User Feedback using Decision Tree on Spark Alarm fatigue is a phenomena where some one is exposed to large number of alarms, become desensitized to them and start ignoring them. It’s been [...]
Pluggable Rule Driven Data Validation with Spark Data validation is an essential component in any ETL data pipeline. As we all know most Data Engineers and Scientist spend most of their time cleaning and preparing [...]
Improving Elastic Search Query Result with Query Expansion using Topic Modeling Query expansion is a process of reformulating a query to improve query results and to be more specific to improve the recall for a [...]
In Cassandra, rows are hash partitioned by default. If you want to data sorted by some attribute, column name sorting feature of Cassandra is usually exploited. If you look at the Cassandra slice range [...]