The NCSC Data Dives series is a forum to help courts dive deeper into data analytics and gain valuable insights to help solve their data problems. Through a combination of interactive/group and individual/focused sessions, we will discuss past projects, innovations and topics of interest each month. These discussions also support the strategic future-ready court planning outlined in NCSC’s Just Horizons initiative.
Anticipated topics include:
- Beyond ChatGPT: How can AI tools help you?
- Why Should You (Kind Of) Ditch Microsoft Excel
- Data Storytelling: Another Tool in the Research Toolbox
- Georeferencing Data
- Best Practices When Working with PII Data
- Working with Audio and Video Data
Web scraping is the process of extracting data from websites using automated tools or programs to access web pages, read the page’s source code or other structured data on those pages, and extract the desired information.
In this overview, you will learn:
- The difference between web scraping and web crawling
- What skills and tools are used for web scraping
- Common use cases
- Risks for sites/organizations that have their sites scraped
- Ways to mitigate risks
- When web scraping is harmless
A decision tree infographic is also available.
This live workshop during the Data Specialists and Information Technologists Summit focused on NCSC's experience using large language models, such as ChatGPT, to extract data from court documents. Our team provided an overview of how ChatGPT works, its limitations and alternative ways to overcome limitations for its use. The workshop also featured a demonstration of a data pipeline which took in PDF documents, performed Optical Character Recognition (OCR) for extracting the text, and then restructured the textual information into a CSV file using ChatGPT.
Learn more by viewing the workshop materials:
For more information about Data Dives or NCSC’s Data Initiatives, email Data Scientist Andre Assumpcao.