Replies: 1 comment
-
|
It is true that if you read a csv2 (Pandas like) file you must either have a header in the file or use a schema. But the number of records in header or schema are for efficiently and only once allocate memory for the given column. In other words, the number of records doesn't need to be accurate. I could be zero or an approximation or accurate. Of course if it is not accurate you may allocate more memory than you need or allocate multiple times. But in any case it would work |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Warm hello. Based on what i see in the documentation of how to read in 'pandas style' csv data, it seems unless one specifies a header line whose values are of the form :::, etc., the data cannot be parsed. One can use a 'schema' but in that case one must specify the length of columns with knowledge that is otherwise not convenient to obtain (and of course hard code in the column names and their sequence).
im looking for a way to read in a csv with a less rigid format requirement (but there doesnt seem to be one), then apply the get_columns_info() method to make up for the necessary info in conjunction with other utilities on offer like load_column(). ill then have all i need to initialize a DataFrame object.
Id very much appreciate a simple confirmation that my understanding is correct that unless one supplies a header line as csv2 expects, there's no way to circumvent (temporarily) the formatting requirement in conjunction with get_columns_info(). Thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions