摘要： While Pandas is the library for data processing in Python, it isn't really built for speed. Learn more about the new library, Modin, developed to distribute Pandas' computation to speedup your data prep.
摘要： Design science research (hereafter DSR) is a relatively new approach to research (Reubens, 2016) with a goal to construct a new reality (i.e. solve problems) instead of explaining an existing reality, or helping to make sense of it (Iivari and Venable, 2009).
摘要： Data integration uses both technical and business processes to merge data from different sources, with the goal of accessing useful and valuable information, efficiently. A well-thought-out data integration solution can deliver trusted data from a variety sources.
摘要： In addition to the myriad cybersecurity risks organizations already face, another vulnerability has emerged that affects most enterprises: the presence of sensitive data stored in unsecured files, i.e. unstructured data.
摘要： “The Snowflake Private Data Exchange represents the future of managing and sharing data broadly and securely inside enterprise and institutional boundaries,” Snowflake CEO, Frank Slootman said. “The data exchange model will become the deployment standard for exploring, discovering and sharing data enterprise-wide.”
摘要： Veeva OpenData Explorer is a new web-based portal to access approximately 16 million healthcare professionals (HCPs), healthcare organizations (HCOs), and their affiliations spanning 34 countries. The open API simplifies integration of Veeva OpenData with third-party applications and services so companies can leverage their customer data where they need it. With these latest innovations, Veeva is giving customers greater choice in how they use Veeva OpenData and making it even easier to access accurate customer data.”
摘要： Netflix has open sourced Metaflow, an internally developed tool for building and managing Python-based data science projects. Metaflow addresses the entire data science workflow, from prototype to model deployment, and provides built-in integrations to AWS cloud services.
摘要： In the banking or pharmacy industry where regulations compel companies to have good governance in place, in industries such as publishing and telecom Data Governance often seems complicated and theoretical. That’s according to Sara Willovit, Product Data Governance at Becton Dickenson.
摘要： Generative AI language models like OpenAI’s GPT-2 produce impressively coherent and grammatical text, but controlling the attributes of this text — such as the topic or sentiment — requires architecture modification or tailoring to specific data. That’s why a team of scientists at Uber, Caltech, and the Hong Kong University of Science and Technology devised what they call the Plug and Play Language Model (PPLM), which combines a pretrained language model with one or more attribute classifiers that guide novel text generation.