Tech

How to control the data channel: Follow these 3 best practices


Data scientists have to make decisions about what data to include in the data warehouse. To make this decision-making process easier, learn tips to stay in control of your data funnel.

mary-how-data-funnel
Image: Elnur / Adobe Stock

As of 2022, 2.5 trillion new bytes of data are being created worldwide every day. While some of this data will be useful for analysis, it can be time-consuming and difficult to organize. By creating an efficient data channel, you’ll be able to more easily filter out the data you need.

SEE: Recruitment Toolkit: Database Engineer (TechRepublic Premium)

What is a data channel?

Data funnels refer to narrowing down the amount of data you allow into your primary data warehouse.

A good way to think about the data funnel is to compare it to the recruiting channels a human resources tool uses when it uses software to screen candidates’ resumes. HR enters requirements for an open position into an analytics software that screens incoming resumes to create a smaller data channel about candidates for a given position. This allows HR and interviewing managers to focus on more important tasks instead of manually crafting resumes.

Funneling also works on data. In one case, a life sciences company working on a specific disease-fighting molecule removed all research data sources that didn’t mention the molecule’s name. The goal is to save memory and processing and get insights sooner. While filtering out all irrelevant data has worked for this company, controlling the data channel is a balancing act between the amount of data you need and the amount of data you can afford to store. storage and handling.

How do you decide which data is important?

The sheer cost of storage and processing, whether it’s in-house or in the cloud, is forcing companies to evaluate how much data they need for business analytics.

In some cases, deciding which data to discard is easy. You probably don’t want your data to be subject to interference from network-machine handshakes, but it’s more difficult to decide which subject-related data to exclude. There is also a risk that the analytics team may miss out on important insights due to excluded data.

For example, using the data it typically collects, a UK retailer might not find out that stay-at-home moms made the majority of their purchases online while their husbands were not. they go to play football.

This unexpected but impactful insight, for example, is why end-to-end business and IT teams must be careful when making decisions about how much they narrow down the channels for incoming data.

3 best practices for data channel control

Map out the use cases your analytics are supporting and the data you think they need

This should be a collaborative exercise between IT/data science and end users. Do you want to include product complaints on social media when you are analyzing your sales and revenue data? And if you were studying disease rates in your health service area in New York, would you be interested in what’s happening in California?

Determine how accurate your analytics are

The gold standard for analytical accuracy is that analytics should be at least 95% accurate when compared to what human matter experts would conclude — but do you always need 95%? are not?

You may need 95% accuracy if you are assessing a medical diagnosis based on the health status of certain patients, but may only need 70% accuracy if you are predicting climatic conditions. what it might look like in the next 20 years.

Accuracy requirements have an impact on the data channel, and you can exclude more data and narrow your funnel if you’re just looking for general, long-term trends.

Check the accuracy of your analytics on a regular basis

If your analysis shows 95% accuracy when first deployed, but drops to 80% over time, it makes sense to double-check the data you’re using and recalibrate the data channel. physical.

Perhaps new data sources that were initially unavailable are now available and should be used. Adding these data sources expands the data channel, but if it increases the level of accuracy, then expanding the data channel will be worth it.



Source link

news7g

News7g: Update the world's latest breaking news online of the day, breaking news, politics, society today, international mainstream news .Updated news 24/7: Entertainment, Sports...at the World everyday world. Hot news, images, video clips that are updated quickly and reliably

Related Articles

Back to top button