Sunday, February 14, 2021

Python: improving time required to load 1.5G CSV file

Python: improving time required to load 1.5G CSV file

In previous post we discussed how we can search for specific item in 1.5G CSV file with 132M records. And we improved search from 73625.309 ms (more than a minute) to just ~0.005 ms - almost 15 million times faster. Which is pretty impressive improvement as per my understanding.

But there is still one bottle neck that can be improved - time required for first initial scan of the file. Let's try to improve this in this article.

All source files can be found in JFF-Bohdan/item_lookup

Python: playing with big lists (132M records), checking if item in list

Python: playing with big lists (132M records), checking if item in list

Let's imagine situation when we need to check if item is in list and our list is pretty big. For example, we may have file with hundreds of millions records and we need to develop solution which should be able quickly say if we have specific item in that list or not.

Let's start with naive implementation and then try to improve it iteratively.

This post was inspired by post https://habr.com/ru/post/538358/