OSError: [Errno 24] Too many open files

Hi,

I ran in a problem of too many open files during execution of big workflows (several thousands of fireworks). the system is currently under development so it is not a problem, but it will, because the final system is intended to run many worflows of several thousand of fireworks on a dedicated cluster.

You can find here after the state of a simple scriptTask FIZZLED and the call stack which lead to the problem.

My system limit was 1024 opened files (ulimit -n). I have increased this limit to 4096 and everything runs fine now, but I wonder how I can reduce the number of files open by firework :

  • should I reduce the number of fireworks and increase the number of tasks inside them or something like this ?
  • Run the script-task with useShell = False ?
  • … every advice will be appreciated :slight_smile:

And last but not least : thanks a lot for the good work and for this very nice tool that makes my life easier :slight_smile:

Cheers,
David

“stored_data”: {

    “_exception”: {

    • “_details”: null,
    • “_failed_task_n”: 0,
    • “_stacktrace”: “Traceback (most recent call last):\n File “/usr/lib/python3.4/site-packages/fireworks/core/rocket.py”, line 211, in run\n m_action = t.run_task(my_spec)\n File “/usr/lib/python3.4/site-packages/fireworks/user_objects/firetasks/script_task.py”, line 37, in run_task\n return self._run_task_internal(fw_spec, stdin)\n File “/usr/lib/python3.4/site-packages/fireworks/user_objects/firetasks/script_task.py”, line 48, in _run_task_internal\n shell=self.use_shell)\n File “/usr/lib64/python3.4/subprocess.py”, line 859, in init\n restore_signals, start_new_session)\n File “/usr/lib64/python3.4/subprocess.py”, line 1359, in _execute_child\n errpipe_read, errpipe_write = os.pipe()\nOSError: [Errno 24] Too many open files\n”
      },
  • “_message”: “runtime error during task”,

    “_task”: {

    • “_fw_name”: “ScriptTask”,

      “script”: [

      • “echo “ending correl_S2 workflow””
        ],
    • “use_shell”: true
      }
      },

Hi David

Sorry for the late reply, somehow I did not get the latest FWS tickets in my email

Can you tell me some details of your script? I want to be sure that the issue with too many open files pertains to FireWorks and not what is going on inside the script (e.g., if the script is opening a file and not closing it). e.g. see this:

Best,

Anubhav

···

On Friday, September 2, 2016 at 1:47:26 AM UTC-7, [email protected] wrote:

Hi,

I ran in a problem of too many open files during execution of big workflows (several thousands of fireworks). the system is currently under development so it is not a problem, but it will, because the final system is intended to run many worflows of several thousand of fireworks on a dedicated cluster.

You can find here after the state of a simple scriptTask FIZZLED and the call stack which lead to the problem.

My system limit was 1024 opened files (ulimit -n). I have increased this limit to 4096 and everything runs fine now, but I wonder how I can reduce the number of files open by firework :

  • should I reduce the number of fireworks and increase the number of tasks inside them or something like this ?
  • Run the script-task with useShell = False ?
  • … every advice will be appreciated :slight_smile:

And last but not least : thanks a lot for the good work and for this very nice tool that makes my life easier :slight_smile:

Cheers,
David

“stored_data”: {

“_exception”: {

  • “_details”: null,
  • “_failed_task_n”: 0,
  • “_stacktrace”: “Traceback (most recent call last):\n File “/usr/lib/python3.4/site-packages/fireworks/core/rocket.py”, line 211, in run\n m_action = t.run_task(my_spec)\n File “/usr/lib/python3.4/site-packages/fireworks/user_objects/firetasks/script_task.py”, line 37, in run_task\n return self._run_task_internal(fw_spec, stdin)\n File “/usr/lib/python3.4/site-packages/fireworks/user_objects/firetasks/script_task.py”, line 48, in _run_task_internal\n shell=self.use_shell)\n File “/usr/lib64/python3.4/subprocess.py”, line 859, in init\n restore_signals, start_new_session)\n File “/usr/lib64/python3.4/subprocess.py”, line 1359, in _execute_child\n errpipe_read, errpipe_write = os.pipe()\nOSError: [Errno 24] Too many open files\n”
    },
  • “_message”: “runtime error during task”,

“_task”: {

  • “_fw_name”: “ScriptTask”,
"script": [
- "echo \"ending correl_S2 workflow\""
],
  • “use_shell”: true
    }
    },

Hi Anubhav,

I don’t think I’m in this case : I do not open files, just do a lot of file movement with some FileTransferTasks. I use some subprocesses, but not with Popen as I don’t need the output, just the return code.

I use the old API call() func (https://docs.python.org/3/library/subprocess.html#older-high-level-api). I guess it closes the file descriptor after execution …

The script which has caused the error (too many open files) is just a dummy script task echoing that the workflow execution has completed :

def get_dummy_end_fw():

# get caller module name

caller = inspect.currentframe().f_back

caller_name =  caller.f_globals['__name__']

# create dummy end fw

ft = ScriptTask.from_str("echo \"ending %s workflow\"" % caller_name)

task_name = "dummy end: %s" % caller_name

fw_end = Firework([ft], name=task_name)

return fw_end

The only part in the code where I do an explicit open is into a with construct :

with open(param_path, “w”) as paramfile:

	paramfile.write("%s" % param_content)

I’m an experienced programmer (C, C++, Fortran & Perl mainly), but pretty new to Python…

Maybe have I missed something … for example, I use the psycopg2 lib to connect to a PgSQL DB

So when I create a cursor like this in a funct :

cursor = db_connector.cursor()

I suppose it is destroyed at the end of the function, since no more reference points to this object …

the number of fireworks in all the workflows in the launchpad when it crashed was greater than 1024 (more than 1300 actually).

I will launch further testes and check with lsof command which process open which file. Not this week, because I’m in travel, but next week I will tell you more.

Best,
David

···

Le mardi 13 septembre 2016 02:28:24 UTC+2, Anubhav Jain a écrit :

Hi David

Sorry for the late reply, somehow I did not get the latest FWS tickets in my email

Can you tell me some details of your script? I want to be sure that the issue with too many open files pertains to FireWorks and not what is going on inside the script (e.g., if the script is opening a file and not closing it). e.g. see this:

http://stackoverflow.com/questions/16526783/python-subprocess-too-many-open-files

http://stackoverflow.com/questions/24682167/too-many-open-files-error-with-popen-of-subprocess

Best,

Anubhav

On Friday, September 2, 2016 at 1:47:26 AM UTC-7, [email protected] wrote:

Hi,

I ran in a problem of too many open files during execution of big workflows (several thousands of fireworks). the system is currently under development so it is not a problem, but it will, because the final system is intended to run many worflows of several thousand of fireworks on a dedicated cluster.

You can find here after the state of a simple scriptTask FIZZLED and the call stack which lead to the problem.

My system limit was 1024 opened files (ulimit -n). I have increased this limit to 4096 and everything runs fine now, but I wonder how I can reduce the number of files open by firework :

  • should I reduce the number of fireworks and increase the number of tasks inside them or something like this ?
  • Run the script-task with useShell = False ?
  • … every advice will be appreciated :slight_smile:

And last but not least : thanks a lot for the good work and for this very nice tool that makes my life easier :slight_smile:

Cheers,
David

“stored_data”: {

“_exception”: {

  • “_details”: null,
  • “_failed_task_n”: 0,
  • “_stacktrace”: “Traceback (most recent call last):\n File “/usr/lib/python3.4/site-packages/fireworks/core/rocket.py”, line 211, in run\n m_action = t.run_task(my_spec)\n File “/usr/lib/python3.4/site-packages/fireworks/user_objects/firetasks/script_task.py”, line 37, in run_task\n return self._run_task_internal(fw_spec, stdin)\n File “/usr/lib/python3.4/site-packages/fireworks/user_objects/firetasks/script_task.py”, line 48, in _run_task_internal\n shell=self.use_shell)\n File “/usr/lib64/python3.4/subprocess.py”, line 859, in init\n restore_signals, start_new_session)\n File “/usr/lib64/python3.4/subprocess.py”, line 1359, in _execute_child\n errpipe_read, errpipe_write = os.pipe()\nOSError: [Errno 24] Too many open files\n”
    },
  • “_message”: “runtime error during task”,

“_task”: {

  • “_fw_name”: “ScriptTask”,
"script": [
- "echo \"ending correl_S2 workflow\""
],
  • “use_shell”: true
    }
    },

Hi,

I think I solve the problem: it was related to the logging system I use, as several instances of python interpreter are launched during execution of WFs, I had to reinitialize the logger in each firework. This was ok when running a singleshot, but not in rapidfire mode with large number of launches since it runs several fireworks in the same python interpreter, opening several sockets for logging on each fw launch … I have now fixed the problem and it runs fine.

Thank you for your attention and sorry for the inconvenience caused by my irrelevant question.

Best,
David

···

Le mardi 13 septembre 2016 10:17:35 UTC+2, [email protected] a écrit :

Hi Anubhav,

I don’t think I’m in this case : I do not open files, just do a lot of file movement with some FileTransferTasks. I use some subprocesses, but not with Popen as I don’t need the output, just the return code.

I use the old API call() func (https://docs.python.org/3/library/subprocess.html#older-high-level-api). I guess it closes the file descriptor after execution …

The script which has caused the error (too many open files) is just a dummy script task echoing that the workflow execution has completed :

def get_dummy_end_fw():

get caller module name

caller = inspect.currentframe().f_back

caller_name = caller.f_globals[‘name’]

create dummy end fw

ft = ScriptTask.from_str("echo "ending s workflow\"" caller_name)

task_name = "dummy end: s" caller_name

fw_end = Firework([ft], name=task_name)

return fw_end

The only part in the code where I do an explicit open is into a with construct :

with open(param_path, “w”) as paramfile:

  paramfile.write("%s" % param_content)

I’m an experienced programmer (C, C++, Fortran & Perl mainly), but pretty new to Python…

Maybe have I missed something … for example, I use the psycopg2 lib to connect to a PgSQL DB

So when I create a cursor like this in a funct :

cursor = db_connector.cursor()

I suppose it is destroyed at the end of the function, since no more reference points to this object …

the number of fireworks in all the workflows in the launchpad when it crashed was greater than 1024 (more than 1300 actually).

I will launch further testes and check with lsof command which process open which file. Not this week, because I’m in travel, but next week I will tell you more.

Best,
David

Le mardi 13 septembre 2016 02:28:24 UTC+2, Anubhav Jain a écrit :

Hi David

Sorry for the late reply, somehow I did not get the latest FWS tickets in my email

Can you tell me some details of your script? I want to be sure that the issue with too many open files pertains to FireWorks and not what is going on inside the script (e.g., if the script is opening a file and not closing it). e.g. see this:

http://stackoverflow.com/questions/16526783/python-subprocess-too-many-open-files

http://stackoverflow.com/questions/24682167/too-many-open-files-error-with-popen-of-subprocess

Best,

Anubhav

On Friday, September 2, 2016 at 1:47:26 AM UTC-7, [email protected] wrote:

Hi,

I ran in a problem of too many open files during execution of big workflows (several thousands of fireworks). the system is currently under development so it is not a problem, but it will, because the final system is intended to run many worflows of several thousand of fireworks on a dedicated cluster.

You can find here after the state of a simple scriptTask FIZZLED and the call stack which lead to the problem.

My system limit was 1024 opened files (ulimit -n). I have increased this limit to 4096 and everything runs fine now, but I wonder how I can reduce the number of files open by firework :

  • should I reduce the number of fireworks and increase the number of tasks inside them or something like this ?
  • Run the script-task with useShell = False ?
  • … every advice will be appreciated :slight_smile:

And last but not least : thanks a lot for the good work and for this very nice tool that makes my life easier :slight_smile:

Cheers,
David

“stored_data”: {

“_exception”: {

  • “_details”: null,
  • “_failed_task_n”: 0,
  • “_stacktrace”: “Traceback (most recent call last):\n File “/usr/lib/python3.4/site-packages/fireworks/core/rocket.py”, line 211, in run\n m_action = t.run_task(my_spec)\n File “/usr/lib/python3.4/site-packages/fireworks/user_objects/firetasks/script_task.py”, line 37, in run_task\n return self._run_task_internal(fw_spec, stdin)\n File “/usr/lib/python3.4/site-packages/fireworks/user_objects/firetasks/script_task.py”, line 48, in _run_task_internal\n shell=self.use_shell)\n File “/usr/lib64/python3.4/subprocess.py”, line 859, in init\n restore_signals, start_new_session)\n File “/usr/lib64/python3.4/subprocess.py”, line 1359, in _execute_child\n errpipe_read, errpipe_write = os.pipe()\nOSError: [Errno 24] Too many open files\n”
    },
  • “_message”: “runtime error during task”,

“_task”: {

  • “_fw_name”: “ScriptTask”,
"script": [
- "echo \"ending correl_S2 workflow\""
],
  • “use_shell”: true
    }
    },

Ok no problem - thanks for updating back the list with the answer to what happened

···

On September 22, 2016 at 3:18:30 AM, [email protected] ([email protected]) wrote:

Hi,

I think I solve the problem: it was related to the logging
system I use, as several instances of python interpreter are
launched during execution of WFs, I had to reinitialize the logger
in each firework. This was ok when running a singleshot, but not in
rapidfire mode with large number of launches since it runs several
fireworks in the same python interpreter, opening several sockets
for logging on each fw launch … I have now fixed the problem and
it runs fine.

Thank you for your attention and sorry for the inconvenience caused
by my irrelevant question.

Best,

David

Le mardi 13 septembre 2016 10:17:35 UTC+2, [email protected] a
écrit :

Hi Anubhav,

I don’t think I’m in this case : I do not open files, just do
a lot of file movement with some FileTransferTasks. I use some
subprocesses, but not with Popen as I don’t need the output, just
the return code.

I use the old API call() func (https://docs.python.org/3/library/subprocess.html#older-high-level-api).
I guess it closes the file descriptor after execution …

The script which has caused the error (too many open files) is just
a dummy script task echoing that the workflow execution has
completed :

def get_dummy_end_fw():

get caller module name

caller = inspect.currentframe().f_back

caller_name = caller.f_globals[‘name’]

create dummy end fw

ft = ScriptTask.from_str("echo "ending s workflow\""
caller_name)

task_name = "dummy end: s" caller_name

fw_end = Firework([ft], name=task_name)

return fw_end

The only part in the code where I do an explicit open is into
a with construct :

with open(param_path, “w”) as paramfile:

paramfile.write("s" param_content)

I’m an experienced programmer (C, C++, Fortran & Perl mainly),
but pretty new to Python…

Maybe have I missed something … for example, I use the psycopg2
lib to connect to a PgSQL DB

So when I create a cursor like this in a funct :

cursor = db_connector.cursor()

I suppose it is destroyed at the end of the function, since no more
reference points to this object …

the number of fireworks in all the workflows in the launchpad when
it crashed was greater than 1024 (more than 1300 actually).

I will launch further testes and check with lsof command which
process open which file. Not this week, because I’m in travel, but
next week I will tell you more.

Best,

David

Le mardi 13 septembre 2016 02:28:24 UTC+2, Anubhav Jain a
écrit :

Hi David

Sorry for the late reply, somehow I did not get the latest FWS
tickets in my email

Can you tell me some details of your script? I want to be sure
that the issue with too many open files pertains to FireWorks and
not what is going on inside the script (e.g., if the script is
opening a file and not closing it). e.g. see this:


http://stackoverflow.com/questions/16526783/python-subprocess-too-many-open-files


http://stackoverflow.com/questions/24682167/too-many-open-files-error-with-popen-of-subprocess

Best,

Anubhav

On Friday, September 2, 2016 at 1:47:26 AM UTC-7, [email protected] wrote:

Hi,

I ran in a problem of too many open files during execution of big
workflows (several thousands of fireworks). the system is currently
under development so it is not a problem, but it will, because the
final system is intended to run many worflows of several thousand
of fireworks on a dedicated cluster.

You can find here after the state of a simple scriptTask
FIZZLED and the call stack which lead to the problem.

My system limit was 1024 opened files (ulimit -n). I have increased
this limit to 4096 and everything runs fine now, but I wonder how I
can reduce the number of files open by firework :

  • should I reduce the number of
    fireworks and increase the number of tasks inside them or something
    like this ?
  • Run the script-task with
    useShell = False ?
  • … every advice will be
    appreciated :slight_smile:

And last but not least : thanks a lot for the good work and for
this very nice tool that makes my life easier :slight_smile:

Cheers,

David

“stored_data”:
{

“_exception”:
{

  • “_details”:
    null,
  • “_failed_task_n”:
    0,
  • “_stacktrace”:
    “Traceback (most recent call last):\n
    File
    “/usr/lib/python3.4/site-packages/fireworks/core/rocket.py”,
    line 211, in run\n m_action = t.run_task(my_spec)\n File
    “/usr/lib/python3.4/site-packages/fireworks/user_objects/firetasks/script_task.py”,
    line 37, in run_task\n return
    self._run_task_internal(fw_spec, stdin)\n File
    “/usr/lib/python3.4/site-packages/fireworks/user_objects/firetasks/script_task.py”,
    line 48, in _run_task_internal\n shell=self.use_shell)\n File
    “/usr/lib64/python3.4/subprocess.py”, line 859, in
    init\n restore_signals, start_new_session)\n File
    “/usr/lib64/python3.4/subprocess.py”, line 1359, in
    _execute_child\n errpipe_read, errpipe_write = os.pipe()\nOSError:
    [Errno 24] Too many open files\n”
    },
  • “_message”:
    “runtime error during task”,

“_task”:
{

  • “_fw_name”:
    “ScriptTask”,
"script":
[
- "echo
  \"ending correl_S2 workflow\""
],
  • “use_shell”:
    true
    }
    },

You received this message because you are subscribed to the Google
Groups “fireworkflows” group.

To unsubscribe from this group and stop receiving emails from it,
send an email to [email protected].

To post to this group, send email to [email protected].

Visit this group at https://groups.google.com/group/fireworkflows.

To view this discussion on the web visit
https://groups.google.com/d/msgid/fireworkflows/f5bbcf88-99f8-4f0b-8bd6-629f502a2d55%40googlegroups.com
.

For more options, visit https://groups.google.com/d/optout.