Data Science Academy Newsletter #12
Hi! Welcome to the new edition of our Data Science Academy Newsletter :)
On a Topic from The Data World – Using AI for Data Management in Science
Data is constantly generated everywhere, with a substantial portion originating from scientific research. To progress in science, it is important to reproduce existing data or combine data from multiple sources to learn something new. Effective research data management focuses on data discovery and reuse, helping scientists save time and resources. Moreover, there is a growing demand for high standards of accountability and transparency in scientific data, particularly when seeking funding.
To enhance data accessibility and reusability, integrating AI is essential. What must data look like so that AI can take over data management?It must be machine-actionable, meaning usable by machines without human intervention. The key to achieving this is metadata - structured information that describes data. Complete and standardized metadata benefits both humans and machines. The FAIR data principles, emphasizing data that is findable, accessible, interoperable, and reusable, provide guidance for enabling machine and human data reuse, primarily by ensuring robust and complete metadata.
https://theconversation.com/ai-and-new-standards-promise-to-make-scientific-data-more-useful-by-making-it-reusable-and-accessible-211080
Tip of the week - style your raw data while exploring them!
When analysing datasets in Jupyter Notebook / Jupyter Lab, we can style our output in different ways because it is rendered using HTML and CSS.
We can use pandas library to present the data in the form of a table (a DataFrame) which is similar to what we see in excel and we can customise the tables to make them prettier and easier to follow.
To do so, use Pandas Style API which offers various types of built-in functions (accessible through “dataframe.style”). Let's see how to use some of them!
df.style.bar()
https://pandas.pydata.org/docs/reference/api/pandas.io.formats.style.Styler.bar.html
Tip: You can specify which axis you want to focus on - rows: axis = 1, columns (default): axis = 0. Example use: df.style.bar(axis=1, cmap="YlOrRd")
Colormaps of Matplotlib (a library for creating plots), that can also be used for pandas “cmap” parameter: https://matplotlib.org/stable/users/explain/colors/colormaps.html
df.style.background_gradient()
https://pandas.pydata.org/docs/reference/api/pandas.io.formats.style.Styler.background_gradient.html
df.style.highlight_max()
https://pandas.pydata.org/docs/reference/api/pandas.io.formats.style.Styler.highlight_max.html
df.style.highlight_min()
https://pandas.pydata.org/docs/reference/api/pandas.io.formats.style.Styler.highlight_min.html
That’s all for this week,
Team Data Science Academy