S3 Migration Guide
Upgrading to 4.0.4
Note: This change is only breaking if you created S3 sources using the API and did not provide streams.*.format
.
Following 4.0.0 config change, we are removing streams.*.file_type
field which was redundant with streams.*.format
. This is a breaking change as format
now needs to be required. Given that the UI would always populate format
, only users creating actors using the API and not providing format
are be affected. In order to fix that, simply set streams.*.format
to {"filetype": <file_type>}
.
Upgrading to 4.0.0
We have revamped the implementation to use the File-Based CDK. The goal is to increase resiliency and reduce development time. Here are the breaking changes:
- [CSV] Mapping of type
array
andobject
: before, they were mapped aslarge_string
and hence casted as strings. Given the new changes, ifarray
orobject
is specified, the value will be casted asarray
andobject
respectively. - [CSV]
decimal_point
option is deprecated: It is not possible anymore to use another character than.
to separate the integer part from non-integer part. Given that the float is format with another character than this, it will be considered as a string. - [Parquet]
columns
option is deprecated: You can use Airbyte column selection in order to have the same behavior. We don't expect it, but this could have impact on the performance as payload could be bigger.
Given that you are not affected by the above, your migration should proceed automatically once you run a sync with the new connector. To leverage this:
- Upgrade source-s3 to use v4.0.0
- Run at least one sync for all your source-s3 connectors
- Migration will be performed and an AirbyteControlMessage will be emitted to the platform so that the migrated config is persisted
If a user tries to modify the config after source-s3 is upgraded to v4.0.0 and before there was a sync or a periodic discover check, they will have to update the already provided fields manually. To avoid this, a sync can be executed on any of the connections for this source.
Other than breaking changes, we have changed the UI from which the user configures the source:
- You can now configure multiple streams by clicking on
Add
underStreams
. Output Stream Name
has been renamed toName
when configuring a specific stream.Pattern of files to replicate
field has been renamedGlobs
under the stream configuration.