Issues information
- OS: Ubuntu 20.04
- databases: postgres
- Programming language and version: Ruby 3.1.2
- Link to your project on Github/Gitlab:
Your issue
Hello, all! I’ve been having some trouble getting replibyte to work. I’ve read the provided documentation and searched for discussions online, but I still don’t know what I’m doing wrong.
My issues are two:
- first, I’ve created a simple, preliminary config file for testing purposes and succeeded in uploading my transformed database dump to a bucket in S3. Upon inspecting this dump, however, I noticed no data was transformed! I’ve copied my conf.yaml file below, so you can help point out where I went worng.
- also, I’ve only ever been able to create a dump using my test database. This is because transforming my production dump has, so far, always required more memory than I have available (upwards of 20GB of RAM!). The database in question is by no means tiny (close to 6GB after pg_restore), but I’ve also heard of colleagues using replibyte on much larger data sets, so something funky must be going on.
Lastly, here are the commands I’ve been using to accomplish what little I’ve managed so far:
transform and upload dump:
cat test_dump.sql | replibyte -c conf.yaml dump create -n test_transform -i -s postgresql
download transformed dump (I’ve also not been able to update my local db with this data, so I’ve saved it to a local file):
replibyte -c conf.yaml dump restore local -i postgres -v test_transform -o > test_transform.sql
Thanks in advance for any help given, and I hope we can sort this out so I can make use of this wonderful tool!
Dockerfile content (if any)
# Dockerfile development version
FROM ruby:3.1.2-bullseye
# Install Postgresql 14
RUN apt-get update -y
RUN apt install curl ca-certificates gnupg libzmq5-dev -y
RUN curl https://www.postgresql.org/media/keys/ACCC4CF8.asc \
| gpg --dearmor \
| tee /etc/apt/trusted.gpg.d/apt.postgresql.org.gpg >/dev/null
RUN sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ bullseye-pgdg main" > /etc/apt/sources.list.d/postgresql.list'
RUN apt update
RUN apt-get -y install postgresql-14
# Install node
RUN curl -sL https://deb.nodesource.com/setup_16.x -o /tmp/nodesource_setup.sh
RUN bash /tmp/nodesource_setup.sh
RUN apt install nodejs
# Skip installing gem documentation
RUN set -eux; \
mkdir -p /usr/local/etc; \
{ \
echo 'install: --no-document'; \
echo 'update: --no-document'; \
} >> /usr/local/etc/gemrc
# Install gems
WORKDIR /app
COPY Gemfile Gemfile.lock ./
RUN gem install bundler
RUN mkdir -p vendor/cache
ARG BUNDLE_WITHOUT=development:test
RUN bundle config set without "$BUNDLE_WITHOUT"
RUN bundle check || bundle install --jobs $(nproc)
COPY . ./
# Start server
EXPOSE 3000
ENTRYPOINT ["/app/bin/docker-entrypoint.sh"]
CMD ["bin/rails", "server", "-b", "0.0.0.0"]
config.yaml content
source:
connection_uri: $DATABASE_URL
# database_subset: # downscale database while keeping it consistent
transformers:
- database: public
table: users
columns:
- name: name
transformer_name: first-name
- name: email
transformer_name: email
- database: public
table: people
columns:
- name: email
transformer_name: email
- name: name
transformer_name: first-name
datastore:
aws:
bucket: yuri-db-seed
region: us-east-1
credentials:
access_key_id: $AWS_SEED_BUCKET_ACCESS_KEY_ID
secret_access_key: $AWS_SEED_BUCKET_SECRET_ACCESS_KEY