powerbi-Datasets storage limit and memory limit in Power BI with respect to compression

Jon 2020-11-30 22:24:42

For the datasets it is compressed, for example. Source data sizes are 10 X 100mb csv formatted files (Total 1GB). When loaded into a dataset (assuming that the engine can compress at a 10 to 1 ratio) will result in a 100mb dataset in memory.

The Power BI/SQL Server Analysis Services (Tabular) engine is called Vertipaq. The best post about how it compresses is here.

For items in Dataflows, this will also be compressed, but it more of a basic ZIP style compression, and not as efficient. So the 10 example files could take up 300mb in this format.

variable 2020-12-01 11:33:04

Indeed 1GB data can get compressed to about 100MB and hence the size of pbix file is reduced to 100MB. However that is the storage size (after compression). This is not the same as size of model when it is loaded into memory. When the model in loaded into memory then the entire 1GB will get loaded into memory? Or 100MB? I am confused on this point. Any reference to this concept will be helpful.

Jon 2020-12-01 11:55:43

Yes it will be further compressed by the veritpaq engine, so it will be smaller than the file storage. I recommend using Dax Studio and it metrics, to analyse the in memory dataset sizes

variable 2020-12-11 13:31:20

Please can you give me a reference that says the dataflow csv data is zipped/compressed.

Jon 2020-12-11 14:37:43

If you change the pbix to zip you can see it, the best ref is gqbi.wordpress.com/2017/05/02/…

热门github

A multi-platform library for OpenGL, OpenGL ES, Vulkan, window and input

Dev tool that writes scalable apps from scratch while the developer oversees the implementation

shadcn/ui, but for Svelte. ✨

The Python Risk Identification Tool for generative AI (PyRIT) is an open access automation framework to empower security professionals and machine learning engineers to proactively find risks in their generative AI systems.

Performance-portable, length-agnostic SIMD with runtime dispatch

ZK Credo

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

Joplin - the secure note taking and to-do app with synchronisation capabilities for Windows, macOS, Linux, Android and iOS.

Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on structured state space models, with an efficient hardware-aware design and implementation in the spirit of FlashAttention.

This repository contains System Design resources which are useful while preparing for interviews and learning Distributed Systems

Curso para aprender el lenguaje de programación Python desde cero y para principiantes. 75 clases, 37 horas en vídeo, código, proyectos y grupo de chat. Fundamentos, frontend, backend, testing, IA...

🎓 Path to a free self-taught education in Computer Science!

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java

A collective list of free APIs

📚 Freely available programming books