Skip to content

What are important data systems problems, ignored by research? (2024)

6.9 relevance
Score Breakdown
technical depth
9
novelty
8
actionability
3
community
4
strategic
7
personal
9

Scored daily by a customisable AI persona to surface the most relevant engineering leadership news.

Discussion of unsolved data systems problems; directly relevant to data engineering research interests, high technical depth.

2026-05-29 General databasearchitects.blogspot.com
Summary

A panel at Dutch-Belgian DataBase Day, featuring Allison Lee (Snowflake), Andy Pavlo (CMU), and Hannes Mühleisen (DuckDB/CWI), flagged that variable-length strings, which comprise ~50% of columns in Redshift, are severely understudied due to benchmark simplicity like TPC-H treating strings as fixed-size objects. The discussion also revealed overlooked areas such as network connection handling, query scheduling, and the debate between single-node vs distributed processing, with few SIGMOD/VLDB papers addressing these practical problems.

Key Takeaways
  • Audit your data stack for string compression and variable-length support, and treat benchmark results with skepticism—real-world workloads differ significantly from TPC-H.
Why it matters

As a solutions architect focused on data infrastructure and platform engineering, these gaps directly impact real-world query performance and storage costs, especially given the dominance of string data in analytical workloads.

Author

Viktor Leis

More from Viktor Leis →