Project

General

Profile

Bug #17826

Errors reported during replication

Added by Jan Brońka about 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Critical
Assignee:
Vaibhav Chauhan
Category:
Middleware
Target version:
Seen in:
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

I have 18 luns which are snashoted and replicated on another machine. Previously I used 9.10.1 and all works fine. Yesterday I updated to U1 and replication is also working fine but reports critical errors till all luns are replicated. So, for example replication starts at 14:00 I recieve 18 errors by mail and step by step I recieve less and less error mails while next luns are replicated.

snip_20160929090050.png (45 KB) snip_20160929090050.png Jan Brońka, 09/29/2016 12:02 AM
7277

Related issues

Is duplicate of FreeNAS - Bug #17836: Replication Tasks Broken after update from 9.10.1 to 9.10.1-U1Closed: Duplicate2016-09-28
Is duplicate of FreeNAS - Bug #17905: Replication Task failed. Clone of remote backup dataset snapshot was created: backup-pool/.../myPC-zvol/%recv where myPC-zvol is being replicated and contains the snapshot in questionClosed: Duplicate2016-09-30
Is duplicate of FreeNAS - Bug #17938: Slow Replication in 9.10.1-U1Closed: Duplicate2016-10-01
Has duplicate FreeNAS - Bug #17953: error mailClosed: Duplicate2016-10-03
Is duplicate of FreeNAS - Bug #17981: Replication spontaneously failedClosed: Duplicate2016-10-03
Is duplicate of FreeNAS - Bug #17995: Replication failing after upgradeClosed: Duplicate2016-10-03
Is duplicate of FreeNAS - Bug #18282: [Regression] Error replicating incorrect dataset nameClosed: Duplicate2016-10-16

Associated revisions

Revision 065981ae (diff)
Added by William Grzybowski about 4 years ago

fix(repl): re-set FD before every select call Select writes to fd set every time. Thanks to torek! Ticket: #17826

Revision 7ec6a334 (diff)
Added by William Grzybowski about 4 years ago

fix(repl): re-set FD before every select call Select writes to fd set every time. Thanks to torek! Ticket: #17826 (cherry picked from commit 065981ae6511fd67562d127e97711b0fd7896627)

Revision 7d40581b (diff)
Added by William Grzybowski about 4 years ago

fix(repl): re-set FD before every select call Select writes to fd set every time. Thanks to torek! Ticket: #17826 (cherry picked from commit 065981ae6511fd67562d127e97711b0fd7896627)

History

#1 Updated by Jan Brońka about 4 years ago

  • File debug-PROD-1S-20160928144816.txz added

#2 Updated by Bonnie Follweiler about 4 years ago

  • Assignee set to William Grzybowski

#3 Updated by William Grzybowski about 4 years ago

  • Status changed from Unscreened to Screened

Can you give an example of the emails you have been receiving?

#4 Updated by William Grzybowski about 4 years ago

  • Status changed from Screened to 15

#5 Updated by Jan Brońka about 4 years ago

Sure,

Replication storage1/lun4 -> 1.100.0.2:backup3 failed: Failed: storage1/lun4 (auto-20160928.1400-4d->auto-20160928.1800-4d)
Replication storage1/lun10 -> 1.100.0.2:backup3 failed: Failed: storage1/lun10 (auto-20160928.1000-4d->auto-20160928.1400-4d)
Replication storage1/lun18 -> 1.100.0.2:backup1 failed: Failed: storage1/lun18 (auto-20160928.1400-4d->auto-20160928.1800-4d)
Replication storage1/lun3 -> 1.100.0.2:backup3 failed: Failed: storage1/lun3 (auto-20160928.1000-4d->auto-20160928.1400-4d)
Replication storage1/lun17 -> 1.100.0.2:backup1 failed: Failed: storage1/lun17 (auto-20160928.1400-4d->auto-20160928.1800-4d)
Replication storage1/lun2 -> 1.100.0.2:backup3 failed: Failed: storage1/lun2 (auto-20160928.1000-4d->auto-20160928.1400-4d)
Replication storage1/lun14 -> 1.100.0.2:backup3 failed: Failed: storage1/lun14 (auto-20160928.1000-4d->auto-20160928.1400-4d)

and

Hello,
The replication failed for the local ZFS storage1/lun5 while attempting to
apply incremental send of snapshot auto-20160928.1400-4d -> auto-20160928.1800-4d to 1.100.0.2

#6 Updated by William Grzybowski about 4 years ago

  • Status changed from 15 to Investigation

#7 Updated by William Grzybowski about 4 years ago

  • Priority changed from No priority to Important
  • Target version set to 9.10.1-U2
  • Seen in changed from Unspecified to 9.10.1-U1

#8 Updated by William Grzybowski about 4 years ago

So this happens everyday?

#9 Updated by Jan Brońka about 4 years ago

  • Priority changed from Important to No priority
  • Target version deleted (9.10.1-U2)
  • Seen in changed from 9.10.1-U1 to Unspecified

This happend evey replication period. In my case I have replication from 6:00 + every 4h. So history repeat each time 6:00, 10:00, 14:00, 18:00
It starts to work like this since I update 9.10.1 to U1

#10 Updated by Vaibhav Chauhan about 4 years ago

  • Priority changed from No priority to Important
  • Target version set to 9.10.1-U2
  • Seen in changed from Unspecified to 9.10.1-U1

#11 Updated by Josh Paetzel about 4 years ago

  • Priority changed from Important to Critical

#12 Updated by William Grzybowski about 4 years ago

  • File deleted (debug-PROD-1S-20160928144816.txz)

#13 Updated by William Grzybowski about 4 years ago

  • Is duplicate of Bug #17836: Replication Tasks Broken after update from 9.10.1 to 9.10.1-U1 added

#14 Updated by William Grzybowski about 4 years ago

  • Status changed from Investigation to Needs Developer Review
  • Priority changed from Critical to Blocks Until Resolved
  • Private changed from Yes to No

#15 Updated by Jan Brońka about 4 years ago

7277

More info...
Seems replication process is significantly degradated... look at attached picture - it is state I have from yesterday. Replication process starts... process till about 20% and then drasticly slow down... practically in my case 9.10.1-U1 has no replication working.

I tried also to manually send snapshot to backup system (all pass fine with nice performance) however UI even did not update stare (this works fine on 9.10.1).

#16 Updated by William Grzybowski about 4 years ago

Jan Brońka wrote:

More info...
Seems replication process is significantly degradated... look at attached picture - it is state I have from yesterday. Replication process starts... process till about 20% and then drasticly slow down... practically in my case 9.10.1-U1 has no replication working.

I tried also to manually send snapshot to backup system (all pass fine with nice performance) however UI even did not update stare (this works fine on 9.10.1).

If you check this ticket status you will see this ticket has already been solved, just waiting for another release.
Please use 9.10.1 BE or wait for next update (should happen next monday)

Thanks

#17 Updated by Jan Brońka about 4 years ago

Hi,

In status I still see "Need Review" and "% Done" = 0
So I conclude it is not solved yet.

But, nice to hear this.

#18 Updated by Vaibhav Chauhan about 4 years ago

  • Assignee changed from William Grzybowski to Chris Torek
  • Priority changed from Blocks Until Resolved to Critical

Chris can you please review the changes ?

#19 Updated by Chris Torek about 4 years ago

  • Status changed from Needs Developer Review to Reviewed
  • Assignee changed from Chris Torek to Vaibhav Chauhan

I looked at the changes when they were initially committed, but checked again for the review, they still look good :-)

Chris

#20 Updated by Vaibhav Chauhan about 4 years ago

  • Status changed from Reviewed to Ready For Release

#21 Updated by William Grzybowski about 4 years ago

  • Is duplicate of Bug #17905: Replication Task failed. Clone of remote backup dataset snapshot was created: backup-pool/.../myPC-zvol/%recv where myPC-zvol is being replicated and contains the snapshot in question added

#22 Updated by William Grzybowski about 4 years ago

  • Is duplicate of Bug #17938: Slow Replication in 9.10.1-U1 added

#23 Updated by Josh Paetzel about 4 years ago

#24 Updated by William Grzybowski about 4 years ago

  • Is duplicate of Bug #17981: Replication spontaneously failed added

#25 Updated by William Grzybowski about 4 years ago

  • Is duplicate of Bug #17995: Replication failing after upgrade added

#26 Updated by Vaibhav Chauhan about 4 years ago

  • Status changed from Ready For Release to Resolved

#27 Updated by Broc Seib about 4 years ago

Anyone have a link to the changes made that cause this issue to be resolved? I had somewhat similar replication problems after updating to U1, but I didn't want to open a new issue until I ruled out this fix.

In my case, I ended up removing pipewatcher from autorepl.py to get all my replications working again. Is this at all related or should I post a new issue?

#28 Updated by William Grzybowski about 4 years ago

  • Is duplicate of Bug #18282: [Regression] Error replicating incorrect dataset name added

Also available in: Atom PDF