read our blogs

Blogs or Expert Columns

Top Python Libraries in 2018 in Data Science, Deep Learning, Machine Learning

Top Python Libraries in 2018 in Data Science, Deep Learning, Machine Learning By Dan Clark, KDnuggets. Here are the top 15 Python libraries across Data Science, Data Visualization. Deep Learning, and Machine Learning. We recently published a series of articles looking at the top Python libraries, across Data science, Deep Learning and Machine Learning. As the year draws to a close, we thought we’d give you a special Christmas gift, and collate these into a KDnuggets official top Python libraries in 2018. As always, we want your opinions! So, if you think we’ve unfairly left any out, or if you disagree with any of [...]

The Yield: How to feed the world without ‘wrecking the planet’

The Yield: How to feed the world without ‘wrecking the planet’ Every growing season, farmers stake their livelihoods around a major element they can’t control: weather. But The Yield, an agricultural technology company based in Australia, uses sensors, data and artificial intelligence (AI) to help farmers make informed decisions related to weather, soil and plant conditions. “How do we feed the world without wrecking the planet?” says Ros Harvey, who founded The Yield in 2014. “We do that by taking the guesswork out of growing. This means growers can make better, faster decisions about how they produce the food we [...]

Six Reasons to Choose Cosmos DB

A fintech startup pivots to Azure Cosmos DB Kate Baroni Software Architect, Microsoft Azure The right technology choices can accelerate success for a cloud born business. This is true for the fintech start-up clearTREND Research. Their solution architecture team knew one of the most important decisions would be the database decision between SQL or NoSQL. After research, experimentation, and many design iterations the team was thrilled with their decision to deploy on Microsoft Azure Cosmos DB. This blog is about how their decision was made. Data and AI are driving a surge of cloud business opportunities, and one technology decision that deserves [...]

Top 10 AI Solution Provider of 2018 – Silicon India

ThirdEye Rated as Top 10 Artificial Intelligence Solution Provider 2018 ThirdEye Data: The Answer to All Data Challenges Big data is no fad. The world is witnessing a meteoric rise of data today, which is only doubling in volume by the year. As data evolves, every business organization seeks to explore the deluge of information and glean meaningful insights to drive better decision-making and enhance productivity. Despite such potential, enterprises, especially SMBs, fall behind in implementing data-driven processes. The culprit, however, is not their lack of innovation, but the complexity in unraveling the intricate correlations between seemingly unrelated [...]

Facebook open-sources PyText NLP framework

Facebook open-sources PyText NLP framework  Comment Facebook  AI Research is open-sourcing some of the conversational AI tech it is using to power its Portal video chat display and M suggestions on Facebook Messenger. The company announced today that its PyTorch-based PyText NLP framework is now available to developers. Natural language processing deals with how systems parse human language and are able to make decisions and derive insights. The PyText framework, which the company sees as a conduit for AI researchers to move more quickly between experimentation and deployment, will be particularly useful for tasks like document classification, sequence tagging, semantic parsing [...]

Deep Learning Just Dipped Into Exascale Territory

We all expected that the Summit supercomputer at Oak Ridge National Lab would be a major part of pushing deep learning forward in HPC given its balanced GPU and IBM Power9 profile (not to mention the on-site expertise to get those graphics engines doing cutting-edge workoutside of traditional simulations). Today, researchers from Berkeley Lab and Oak Ridge, along with development partners at Nvidia demonstrated some rather remarkable results using deep learning to extract weather patterns based on existing high-res climate simulation data. This places the collaboration in the running for this year’s Gordon Bell Prize, an annual award based on high performance, efficient use of [...]

A Small Team Of Student AI Coders Beats Google’s Machine-Learning Code

The success shows that advances in artificial intelligence aren’t the sole domain of elite programmers. by Will Knight August 10, 2018 Students from Fast.ai, a small organization that runs free machine-learning courses online, just created an AI algorithm that outperforms code from Google’s researchers, according to an important benchmark. Fast.ai’s success is important because it sometimes seems as if only those with huge resources can do advanced AI research. Fast.ai consists of part-time students keen to try their hand at machine learning—and perhaps transition into a career in data science. It rents access to computers in Amazon’s cloud. But Fast.ai’s team [...]

ThirdEye rated as Top Big Data Developers 2018

ThirdEye rated as Top Big Data Developers 2018 TopDevelopers announces the Top 15 Big Data Analytics Companies of 2018 TopDevelopers.co provides researched list of 15 companies that can offer industry specific Big Data Analytics solutions efficiently for smart business planning. SAN FRANCISCO, CALIFORNIA, USA, December 12, 2018 /EINPresswire.com/ -- TopDevelopers constantly analyses businesses and firms to expose the actuality of current business needs and the trends that are prolific in offering the best business prospects. We have found that Big Data Analytics is now revolutionizing and reforming organizations. Every business is in need of data organizing, segmentation and segregation [...]

How to deliver on Machine Learning projects

How to deliver on Machine Learning  projects A guide to the ML Engineering Loop Follow the loop all the way up! This post was co-authored by Emmanuel Ameisen, Head of AI at Insight Data Science and Adam Coates, Operating Partner at Khosla Ventures. Want to learn applied Artificial Intelligence from top professionals in Silicon Valley or New York? Learn more about the Artificial Intelligenceprogram. Are you a company working in AI and would like to get involved in the Insight AI Fellows Program? Feel free to get in touch. As Machine Learning (ML) is becoming an important part of every industry, the demand for Machine Learning Engineers (MLE) has [...]

How Machine Learning Is Used to Manage Data Center Power Today

How Machine Learning Is Used to Manage Data Center Power Today Here’s how solutions already on the market today are using ML to improve data center uptime and efficiency. It’s no secret that data centers are getting increasingly complicated. There are more types of hardware and management software, more frequently changing workloads, and public cloud. And with edge computing just around the corner, things are about to get even more complicated. Many in the industry expect machine learning to make data center managers’ lives easier in the face of all this complexity. Several companies already sell data center management [...]

Data Warehousing with a Modern Twist

Data Warehousing with a Modern Twist Alex Woodie (Pasuwan/Shutterstock) Bill Inmon is generally credited with inventing the phrase “data warehouse” in the early 1990s to describe the stockpiling of data using relational databases. It may be an older term, but the activity itself remains quite relevant today, especially considering the huge amounts of data we generate every day. However, some of the elements of data warehousing implementations have changed considerably. For starters, the advent of cloud-based data warehouse is upending the traditional market for analytical databases, just as a new generation of front-end BI tools streamlines the delivery of [...]

Data Warehouse Modernization and the Journey to the Cloud

Data Warehouse Modernization and the Journey to the Cloud (mmar/Shutterstock) To say that organizations today are facing a complex data landscape is really an understatement. Data exists in on-premises systems and in the cloud; data is used across applications and accessed across departments. Information is being exchanged in ever-growing volumes with customers and business partners. Websites and social media platforms are constantly adding data to the mix. And now there’s even more data coming from new sources such as the Internet of Things (IoT) via sensors and smart, connected devices. This proliferation of data sources is leading to a [...]

Large Collection of Neural Nets, Numpy, Pandas, Matplotlib, Scikit and ML Cheat Sheets

Large Collection of Neural Nets, Numpy, Pandas, Matplotlib, Scikit and ML Cheat Sheets This collection covers much more than the topics listed in the title. It also features Azure, Python, Tensorflow, data visualization, and many other cheat sheets. Additional cheat sheets can be found hereand here. Below is a screenshot (extract from the data visualization cheat sheet.) The one below is rather interesting too, but the source is unknown, and anywhere it was posted, it is unreadable. This is the best rendering after 30 minutes of work. The full list can be found here or here. It covers the following topics: Big-O Algorithm Cheat [...]

Microsoft Develops Flexible AI System That Can Summarize The News

Image Credit: raindrop74 / Shutterstock Microsoft develops flexible AI system that can summarize the news Condensing paragraphs into sentences isn’t easy for artificial intelligence (AI). That’s because it requires a semantic understanding of the text that’s beyond the capabilities of most off-the-shelf natural language processing models. But it’s not impossible, as researchers at Microsoft recently demonstrated. In a paper published on the preprint server Arxiv.org (“Structured Neural Summarization“), scientists at Microsoft Research in Cambridge, England describe an AI framework that can reason about relationships in “weakly structured” text, enabling it to outperform conventional NLP models on a range of text summarization tasks. [...]

ThirdEye Data launches 3 new Open Source solutions for Anomaly Detection and Predictive Analytics

ThirdEye Data launches 3 new Open Source solutions for Anomaly Detection and Predictive Analytics Over the past 20 years, the Open Source Software (OSS) movement has given developers and programmers the freedom to experiment, innovate, and become more efficient. Part of the digital transformation that’s been facilitated by OSS has also allowed programmers to leverage Machine Learning to develop vital solutions for Anomaly Detection and Predictive Analytics. And so, ThirdEye Data has decided it’s time to make its own contribution to the OSS community by giving away 3 Artificial Intelligence-powered Open Source Software solutions, that help businesses gain Anomaly [...]

What’s Driving the Cloud Data Warehouse Explosion?

What’s Driving the Cloud Data Warehouse Explosion? (RoboLab/Shutterstock) The advent of powerful data warehouses in the cloud is changing the face of big data analytics, as companies move their workloads into the cloud. According to analysts and cloud executives, the phenomenon is accelerating, thanks largely to the potential to save large sums of money, analyze even bigger data sets, and eliminate the hassle of managing on-premise clusters. Amazon Web Services is largely credited with kicking off the cloud data warehousing (CDW) wave with Redshift. Since launching it in 2012, AWS has attracted 6,500 customers to Redshift and remains the company [...]

Embrace The Noise: A Case Study Of Text Annotation For Medical Imaging

Embrace The Noise: A Case Study Of Text Annotation For Medical Imaging In this post we'll discuss the recent paper TextRay: Mining Clinical Reports to Gain a Broad Understanding of Chest X-rays focusing on the best practices the paper exemplifies with regards to labeling text data for NLP. What Is TextRay ? TextRay was written by a team from Zebra Medical Vision, a for-profit company that applies deep learning to medical imaging. One of the core challenges of the medical imaging space is acquiring the labeled images (such as X-rays) to train their models with.  The TextRay paper expands a core insight from a [...]

Deep Learning: Which Loss and Activation Functions should I use?

Deep Learning: Which Loss and Activation Functions should I use? The purpose of this post is to provide guidance on which combination of final-layer activation function and loss function should be used in a neural network depending on the business goal. This post assumes that the reader has knowledge of activation functions. An overview on these can be seen in the prior post: Deep Learning: Overview of Neurons and Activation Functions What are you trying to solve? Like all machine learning problems, the business goal determines how you should evaluate it’s success. Are you trying to predict a numerical value? Examples: [...]

Bulk Mutation in an Integration Data Lake with Spark

Bulk Mutation in an Integration Data Lake with Spark Data lakes act as repository of data from various sources, possibly of different formats. It can be used to build data warehouse or to perform other data analysis activities. Data lakes are generally built on top of Hadoop Distributed File (HDFS), which is append only. HDFS is essentially WORM file system i.e. Write Once and Read Many Times. In an integration scenario, however your source data streams may have updates and deletes. This post is about performing updates and deletes in an HDFS backed data lake.The Spark based solution is available in my open source project chombo. Virtual [...]

Learning Alarm Threshold from User Feedback using Decision Tree on Spark

Learning Alarm Threshold from User Feedback using Decision Tree on Spark Alarm fatigue is a phenomena where some one is exposed to large number of alarms, become desensitized to them and start ignoring them. It’s been reported that security professionals ignore 32% of alarms because they are thought to be false. This kind of sensory overload can happen with monitoring systems in various domains, e.g computer systems and network, industrial monitoring systems and medical patient monitoring systems. Typically alarm flooding happens when alarm threshold levels are not set properly. How do we know what the proper alarm threshold level should be. That is the problem we will [...]

Pluggable Rule Driven Data Validation with Spark

Pluggable Rule Driven Data Validation with Spark Data validation is an essential component in any ETL data pipeline. As we all know most Data Engineers and Scientist spend most of their time cleaning and preparing their databefore they can even get to the core processing of the data. In this post we will go over a pluggable rule driven data validation solution implemented on Spark. Earlier I had posted about the same solution implemented on Hadoop. This post can be considered as a sequel to the earlier post. The solution is available in my open source project chombo on github. Anatomy of a Validator A validator [...]

Improving Elastic Search Query Result with Query Expansion using Topic Modeling

Improving Elastic Search Query Result with Query Expansion using Topic Modeling Query expansion is a process of reformulating a query to improve query results and to be more specific to improve the recall for a query. Topic modeling is an Natural Language Processing (NLP) technique to discover hidden topics or concepts in documents. We will be going through a Query Expansion technique based on Topic Modeling. The solution is based on Latent Dirichlet Allocation (LDA) algorithm as implemented python gensim library. LDA is a popular Topic Modeling algorithm. The implementation is available from my open source project avenir in github. It provides an user friendly wrapper class around gensim LDA implementation. Query Expansion Query expansion is the technique of expanding the [...]

Cassandra Range Query Made Simple

In Cassandra, rows are hash partitioned  by default. If you want to data sorted by some attribute, column name sorting feature of Cassandra is usually exploited. If you look at the Cassandra slice range API, you will find that you can specify only the range start, range end and an upper limit on the number of columns fetched. However in many applications the need is to paginate through the data i.e each call should fetch a predetermined number of items. There is no easy way to map the desired number of items to be returned  to the column name [...]