Exporting big data to CSV files

+1 vote
asked Apr 14, 2015 by kevin

My current task requires me to export an estimated 100,000 rows of data from a very large database.

I'm fairly new to handing big data and I would love to hear a few best practices and guidelines from people who have had prior experiences with these issues that worked for them in the past in an effort to make this post non-subjective.

A few more details:

  • The database is not at all normalized (very ugly)

  • I have to deal with at least 100,000 rows in total

  • The task is run at midnight with less users

  • Currently using ColdFusion 9, PostgreSQL 8.4

Thank you!

This is what my code looks like after applying Craig's solution:

<cfset base_path = GetDirectoryFromPath(ExpandPath("*.*")) & "some_parent\some_child\">
<cfif not DirectoryExists(base_path)> <cfdirectory directory="#base_path#" action="create" mode="777">
</cfif>
<cfset this_batch_path = DateFormat(Now(), 'mmddyyyy') & TimeFormat(Now(), 'hhmmss') & "\">
<cfdirectory directory="#base_path##this_batch_path#" action="create" mode="777">
<cfset this_filename = "someprefix_" & DateFormat(Now(), 'yyyymmdd') & ".csv">
<cffile action="write" file="#base_path##this_batch_path##this_filename#" output="">
<cfset escaped_copy_path = ListChangeDelims(base_path & this_batch_path & this_filename, "\\", "\")>
<cfquery name="qMyQuery" datasource="some_db" username="some_uname" password="some_pword" result="something"> COPY some_table TO '#escaped_copy_path#' WITH CSV HEADER;
</cfquery>

Now I need to get the count of copied rows. In the PGSQL 8.4 docs:

Outputs

On successful completion, a COPY command returns a command tag of the form

COPY count

The count is the number of rows copied.

But I can't seem to get it to work, even with the result tag and the query itself.

1 Answer

+5 votes
answered Nov 29, 2018 by craig-ringer

100,000 rows isn't big, unless these rows are very very wide with lots of big values.

Just use psql and \copy (SELECT ...) TO '/some/local/file' WITH (FORMAT CSV, HEADER)

If you want you can estimate the data size:

select pg_size_pretty(sum( octet_length(t::text) )) FROM mytable t WHERE ...;

For actually big data extract runs, sometimes you may want to use an ETL tool like Talend Studio, Pentaho Kettle or CloverETL.

BTW, it's time to start thinking about upgrading from 8.4 as it's now end-of-life.

Welcome to Q&A, where you can ask questions and receive answers from other members of the community.
Website Online Counter

...