how to speed up extracting data from a large binary file in python

0 votes
asked Nov 30, 2016 by r-davies

I'm pretty new to python and have been tasked with extracting observation data from large binary files (this one is 470M, but they can be bigger).

My problem is that the data structure in the file means I need to read an element before I can establish how many times to read the next set of elements. I am currently using loops and calling numpy.fromfile() a lot, which is the slowest part of my code, taking the majority of the ~5sec runtime. I've been tasked with getting the time below 1 sec!

The file structure is as follows, with varying data types throughout:

  1. Count of snapshots (N)

    • Snapshot information repeated N times.
  2. Number of gridpoints (G)

    • Gridpoint information and data counter (C) repeated G times.
      • data repeated C times.

I am currently doing this for part 2 above: where dtype is just a stand in for the specific dtype used.

data = {}

data['Grid_Point_counter'] = numpy.fromfile(file,dtype2,1)

number_gridpoints = data['Grid_Point_counter'][0]

if number_gridpoints > 0
    gridpoint_data = []

    for gridpoint in range(0,number_gridpoints)
        grdpt_data = numpy.fromfile(file,dtype3,1)
        number_bt_data = grdpt_data[0][-1]

        if number_bt_data > 0:
                bt_data_block = numpy.fromfile(data_loc,dtype4, num_bt_data)

        gridpoint_data.append([gpt_data, bt_data_block])

data['Grid_Point_Data'] = gridpoint_data

If any of you very knowledgeable people can help that would be great!

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
Anti-spam verification:
To avoid this verification in future, please log in or register.
Welcome to Q&A, where you can ask questions and receive answers from other members of the community.
Website Online Counter

...