O'Reilly Data Show Podcast

O'Reilly Data Show Podcast

  • 概覽
  • 聲音
概覽
himalaya
15 聲音
The O'Reilly Data Show Podcast explores the opportunities and techniques driving big data, data science, and AI.
查看更多
聲音
15聲音

In this episode of the Data Show, I speak with Peter Bailis, founder and CEO of Sisu, a startup that is using machine learning to improve operational analytics. Bailis is also an assistant professor of computer science at Stanford University, where he conducts research into data-intensive systems and where he is co-founder of the DAWN Lab. We had a great conversation spanning many topics, including: His personal blog, which contains some of the best explainers on emerging topics in data management and distributed systems. The role of machine learning in operational analytics and business intelligence. Machine learning benchmarks—specifically two recent ML initiatives that he’s been involved with: DAWNBench and MLPerf. Trends in data management and in tools for machine learning development, governance, and operations. Related resources: “Setting benchmarks in machine learning”: Dave Patterson, Peter Bailis, and other industry leaders discuss how MLPerf will define an entire suite...

In this episode of the Data Show, I speak with Arun Kejariwal of Facebook and Ira Cohen of Anodot (full disclosure: I’m an advisor to Anodot). This conversation stemmed from a recent online panel discussion we did, where we discussed time series data, and, specifically, anomaly detection and forecasting. Both Kejariwal (at Machine Zone, Twitter, and Facebook) and Cohen (at HP and Anodot) have extensive experience building analytic and machine learning solutions at large scale, and both have worked extensively with time-series data. The growing interest in AI and machine learning has not been confined to computer vision, speech technologies, or text. In the enterprise, there is strong interest in using similar automation tools for temporal data and time series. We had a great conversation spanning many topics, including: Why businesses should care about anomaly detection and forecasting; specifically, we delve into examples outside of IT Operations & Monitoring. (Specialized) techni...

In this episode of the Data Show, I speak with Michael Mahoney, a member of RISELab, the International Computer Science Institute, and the Department of Statistics at UC Berkeley. A physicist by training, Mahoney has been at the forefront of many important problems in large-scale data analysis. On the theoretical side, his works spans algorithmic and statistical methods for matrices, graphs, regression, optimization, and related problems. On the applications side, he has contributed to systems used for internet and social media analysis, social network analysis, as well as for a host of applications in the physical and life sciences. Most recently, he has been working on deep neural networks, specifically developing theoretical methods and practical diagnostic tools that should be helpful to practitioners who use deep learning. Analyzing deep neural networks with WeightWatcher. Image by Michael Mahoney and Charles Martin, used with permission. We had a great conversation spanning ma...

In this episode of the Data Show, I speak with Kesha Williams, technical instructor at A Cloud Guru, a training company focused on cloud computing. As a full stack web developer, Williams became intrigued by machine learning and started teaching herself the ML tools on Amazon Web Services. Fast forward to today, Williams has built some well-regarded Alexa skills, mastered ML services on AWS, and has now firmly added machine learning to her developer toolkit. Anatomy of an Alexa skill. Image by Kesha Williams, used with permission. We had a great conversation spanning many topics, including: How she got started and made the transition into a full-fledged machine learning practitioner. We discussed the evolution of ML tools and learning resources, and how accessible they’ve become for developers. How to build and monetize Alexa skills. Along the way, we took a deep dive and discussed some of the more interesting Alexa skills she has built, as well as one that she really admires. Rela...

In this episode of the Data Show, I speak with Alex Ratner, project lead for Stanford’s Snorkel open source project; Ratner also recently garnered a faculty position at the University of Washington and is currently working on a company supporting and extending the Snorkel project. Snorkel is a framework for building and managing training data. Based on our survey from earlier this year, labeled data remains a key bottleneck for organizations building machine learning applications and services. Ratner was a guest on the podcast a little over two years ago when Snorkel was a relatively new project. Since then, Snorkel has added more features, expanded into computer vision use cases, and now boasts many users, including Google, Intel, IBM, and other organizations. Along with his thesis advisor professor Chris Ré of Stanford, Ratner and his collaborators have long championed the importance of building tools aimed squarely at helping teams build and manage training data. With today’s ...

In this episode of the Data Show, I speak with Cassie Kozyrkov, technical director and chief decision scientist at Google Cloud. She describes “decision intelligence” as an interdisciplinary field concerned with all aspects of decision-making, and which combines data science with the behavioral sciences. Most recently she has been focused on developing best practices that can help practitioners make safe, effective use of AI and data. Kozyrkov uses her platform to help data scientists develop skills that will enable them to connect data and AI with their organizations’ core businesses. We had a great conversation spanning many topics, including: How data science can be more useful The importance of the human side of data The leadership talent shortage in data science Is data science a bubble? Related resources: “Managing machine learning in the enterprise: Lessons from banking and health care” “Managing risk in machine learning” “What are model governance and model operation...

In this episode of the Data Show, I spoke with Roger Chen, co-founder and CEO of Computable Labs, a startup focused on building tools for the creation of data networks and data exchanges. Chen has also served as co-chair of O’Reilly’s Artificial Intelligence Conference since its inception in 2016. This conversation took place the day after Chen and his collaborators released an interesting new white paper, Fair value and decentralized governance of data. Current-generation AI and machine learning technologies rely on large amounts of data, and to the extent they can use their large user bases to create “data silos,” large companies in large countries (like the U.S. and China) enjoy a competitive advantage. With that said, we are awash in articles about the dangers posed by these data silos. Privacy and security, disinformation, bias, and a lack of transparency and control are just some of the issues that have plagued the perceived owners of “data monopolies.” In recent years, ...

In this week’s episode of the Data Show, we’re featuring an interview Data Show host Ben Lorica participated in for the Software Engineering Daily Podcast, where he was interviewed by Jeff Meyerson. Their conversation mainly centered around data engineering, data architecture and infrastructure, and machine learning (ML). Here are a few highlights: Tools for productive collaboration A data catalog, at a high level, basically answers questions around the data that’s available and who is using it so an enterprise can understand access patterns. … The term “data catalog” is generally used when you’ve gotten to the point where you have a team of data scientists and you need a place where they can use libraries in a setting where they can collaborate, and where they can share not only models but maybe even data pipelines and features. The more advanced data science platforms will have automation tools built in. … The ideal scenario is the data science platform is not just for pro...

In this episode of the Data Show, I spoke with Nick Pentreath, principal engineer at IBM. Pentreath was an early and avid user of Apache Spark, and he subsequently became a Spark committer and PMC member. Most recently his focus has been on machine learning, particularly deep learning, and he is part of a group within IBM focused on building open source tools that enable end-to-end machine learning pipelines. We had a great conversation spanning many topics, including: AI Fairness 360 (AIF360), a set of fairness metrics for data sets and machine learning models Adversarial Robustness Toolbox (ART), a Python library for adversarial attacks and defenses. Model Asset eXchange (MAX), a curated and standardized collection of free and open source deep learning models. Tools for model development, governance, and operations, including MLflow, Seldon Core, and Fabric for deep learning Reinforcement learning in the enterprise, and the emergence of relevant open source tools like Ray. Related...

In this episode of the Data Show, I spoke with Dhruba Borthakur (co-founder and CTO) and Shruti Bhat (SVP of Product) of Rockset, a startup focused on building solutions for interactive data science and live applications. Borthakur was the founding engineer of HDFS and creator of RocksDB, while Bhat is an experienced product and marketing executive focused on enterprise software and data products. Their new startup is focused on a few trends I’ve recently been thinking about, including the re-emergence of real-time analytics, and the hunger for simpler data architectures and tools. Borthakur exemplifies the need for companies to continually evaluate new technologies: while he was the founding engineer for HDFS, these days he mostly works with object stores like S3. We had a great conversation spanning many topics, including: RocksDB, an open source, embeddable key-value store originated by Facebook, and which is used in several other open source projects. Time-series databases. The...

12
常見問題
  • Himalaya 是什麼?
    喜馬拉雅國際版,Himalaya 是一款有聲書 App,旨在為全球華人的終身學習提供隨時、隨地、隨心的全新聽書體驗。成為會員,即可以暢聽站內 100,000+ 海量會員內容。
  • Himalaya VIP 有什麼權益?
    你僅需花費每日低至 0.16 美金,就可以立即暢聽 100,000+ 全球銷量超百萬的暢銷有聲書,每週聽一本爆款新書,還有更多預售新書等著你!另可獲得每月 5 張免費體驗卡贈親友的福利,等同於贈送 1 張年卡的價值。
  • 我怎麼享受免費試用?
    現在訂閱 Himalaya VIP 即可享受至少 7 天的免費試用! 免費試用期內,無需付費即可免費暢聽會員包中的全部內容,包含 100,000+ 全球銷量超百萬的暢銷有聲書,和世界名校教授的原聲英文課程。
  • 我該怎麼使用優惠碼?
    在 Himalaya 首⻚選擇「開啟免費體驗」註冊完成之後, 輸入「優惠碼」選擇申請,支付成功後即可開啟 Himalaya VIP 內容免費暢聽權益!
  • 可以在哪收聽?
    Himalaya 提供你隨時隨地想听就听的服務, 可以下載 Himalaya APP 使用手機享受服務,同時也支持網頁版登陸在電腦上享受暢聽服務。
  • Himalaya VIP 的價格是多少?
    Himalaya VIP 採用連續訂閱的模式,按月訂閱價格為 $11.99/月;按年訂閱價格為 $59.99/年。每天僅需 0.16 美元,讓耳朵隨時隨地步入擁有 100,000+ 書籍你的專屬圖書館。
  • 我不想訂閱了,要如何取消?
    通過網頁端訂閱如何取消?
    你可以 點擊這裡 取消訂閱。 在試用期內取消訂閱,則不會自動續費;如果你已經成功續費後取消訂閱,則下個扣款週期不會自動續費。
    通過手機端訂閱如何取消?
    你可以在iTunes/Apple或Google Play設定中取消訂閱。在試用期到期前48小時取消訂閱,則不會自動續費;如果你已經成功續費後取消訂閱,則下個扣款週期不會自動續費。你可以通過以下連結找到如何取消訂閱的詳細資訊:Apple Store取消訂閱方法  Google Play取消訂閱方法

與Himalaya一起

每天15分鐘
在碎片的時間裡,學習一個知識點;通勤時、家務時、運動時,隨時隨地暢聽
每週1本新書
優選最新最熱暢銷書,資深編輯精心挑選榜單佳作,只聽有價值的好書
每年10大系列
商業財經、歷史文化、親子育兒,同系列好書好課一網打盡,帶你深入探究一個主題
app store
google play